Section: New Results
Mixture models
Taking into account the curse of dimensionality
Participant : Stéphane Girard.
Joint work with: Bouveyron, C. (Université Paris 1), Celeux, G. (Select, INRIA), Jacques, J. (Université Lille 1).
In the PhD work of Charles Bouveyron (coadvised by Cordelia Schmid from the INRIA LEAR team) [44] , we propose new Gaussian models of high dimensional data for classification purposes. We assume that the data live in several groups located in subspaces of lower dimensions. Two different strategies arise:

the introduction in the model of a dimension reduction constraint for each group

the use of parsimonious models obtained by imposing to different groups to share the same values of some parameters
This modelling yields a new supervised classification method called High Dimensional Discriminant Analysis (HDDA) [4] . Some versions of this method have been tested on the supervised classification of objects in images. This approach has been adapted to the unsupervised classification framework, and the related method is named High Dimensional Data Clustering (HDDC) [3] .
In collaboration with Gilles Celeux and Charles Bouveyron, we are currently working on the automatic selection of the discrete parameters of the model. The results are submitted for publication [40] . Also, the description of the R package is submitted for publication [39] . An application to the classification of highdimensional vibrational spectroscopy data has also been developped [20] .
Information criteria for model selection in the case of multimodal data
Participants : Florence Forbes, Vasil Khalidov.
Joint work with: Radu Horaud from the INRIA Perception team.
A multimodal data setting is a combination of multiple data sets each of them being generated from a different sensors. The data sets live in different physical spaces with different dimensionalities and cannot be embedded in a single common space. We focus on the issue of clustering such multimodal data. This raises the question of how to perform pairwise comparisons between observations living in different spaces. A solution within the framework of Gaussian mixture models and the ExpectationMaximization (EM) algorithm, has been proposed in [21] . Each modality is associated to a modalityspecific Gaussian mixture which shares with the others a number of common parameters and a common number of components. Each component corresponds to a common multimodal event that is responsible for a number of observations in each modality. As this number of components is usually unknown, we propose information criteria for selecting this number from the data. We introduce new appropriate criteria based on a penalized maximum likelihood principle. A consistency result for the estimator of the common number of components is given under some assumptions. In practice, the need for a maximum likelihood estimation also requires that we are able to properly initialize the EM algorithm of [21] . We then also propose an efficient initialization procedure. This procedure and the new conjugate BIC score we derived are illustrated successfully on a challenging two modality task of detecting and localizing audiovisual objects.
Multiple scaled Student distributions with application to clustering
Participants : Florence Forbes, Senan James Doyle, Darren Wraith.
There is an increasingly large literature for statistical approaches to cluster data for a very wide variety of applications. For many applications there has also been an increasing need for approaches to be robust in some sense. For example, in some applications the tails of normal distributions are shorter than appropriate or parameter estimations are affected by atypical observations (outliers). A popular approach proposed for these cases is to fit a mixture of Student distributions (either univariate or multivariate) providing an additional degree of freedom (dof) parameter which can be viewed as a robustness tuning parameter.
An additional advantage of the Student approach is a convenient computational tractability via the use of the EM algorithm with the cluster membership treated as missing variable/data. An additional numerical procedure is then used to find the ML estimate of the degree of freedom.
There are many ways to generalize the Student distribution. Recent approaches such as the skew Student etc.. Much less interest though has focussed on alternative forms for the degree of freedom parameter. The standard student in this regard has one disadvantage: all its marginals are Student but have the same degree of freedom and hence the same amount of tailweight. As noted by Azzalini and Genton in a recent review paper, a simple example is where one variable has Cauchy tails (df=1) and another Gaussian. In this situation, "the single degrees of freedom parameter has to provide a compromise between those two tail behaviours". One solution could be to take a product of independent tdistributions of varying degree of freedom but assuming no correlation between dimensions. For many applications this may however be too strong an assumption. Jones in 2002 proposed a dependent bivariate t distribution with marginals of different degree of freedom but the tractability of the extension to the multivariate case is unclear. Increasingly there has been much research on copula approaches to account for flexible distributional forms but the choice as to which one to use in this case and the applicability to (even) moderate dimensions is not clear.
In this work we propose to extend the Student distribution to allow for the degree of freedom parameter to be estimated differently in each dimension of the parameter space. The key feature of the approach is a decomposition of the covariance matrix which facilitates the separate estimation and also allows for arbitrary correlation between dimensions. The properties of the approach and an assessment of it's performance are outlined on several datasets that are particularly challenging to the standard Student mixture case and also to many alternative clustering approaches.