Section: New Results
Mixture models
Parameter estimation in the heterogeneity linear mixed model
Participant : MarieJosé Martinez.
Joint work with: Emma Holian (National University of Ireland, Galway)
In studies where subjects contribute more than one observation, such as in longitudinal studies, linear mixed models have become one of the most used techniques to take into account the correlation between these observations. By introducing random effects, mixed models allow the withinsubject correlation and the variability of the response among the different subjects to be taken into account. However, such models are based on a normality assumption for the random effects and reflect the prior belief of homogeneity among all the subjects. To relax this strong assumption, Verbeke and Lesaffre (1996) proposed the extension of the classical linear mixed model by allowing the random effects to be sampled from a finite mixture of normal distributions with common covariance matrix. This extension naturally arises from the prior belief of the presence of unobserved heterogeneity in the random effects population. The model is therefore called the heterogeneity linear mixed model. Note that this model does not only extend the assumption about the random effects distribution, indeed, each component of the mixture can be considered as a cluster containing a proportion of the total population. Thus, this model is also suitable for classification purposes.
Concerning parameter estimation in the heterogeneity model, the use of the EMalgorithm, which takes into account the incomplete structure of the data, has been considered in the literature. Unfortunately, the Mstep in the estimation process is not available in analytic form and a numerical maximisation procedure such as NewtonRaphson is needed. Because deriving such a procedure is a nontrivial task, Komarek et al. (2002) proposed an approximate optimization. But this procedure proved to be very slow and limited to small samples due to requiring manipulation of very large matrices and prohibitive computation.
To overcome this problem, we have proposed in an alternative approach which consists of fitting directly an equivalent mixture of linear mixed models. Contrary to the heterogeneity model, the Mstep of the EMalgorithm is tractable analytically in this case. Then, from the obtained parameter estimates, we can easily obtain the parameter estimates in the heterogeneity model.
Taking into account the curse of dimensionality
Participants : Stéphane Girard, Alessandro Chiancone, SeydouNourou Sylla.
Joint work with: C. Bouveyron (Univ. Paris 5), M. Fauvel (ENSAT Toulouse) and J. Chanussot (Gipsalab and GrenobleINP)
In the PhD work of Charles Bouveyron (coadvised by Cordelia Schmid from the Inria LEAR team) [67] , we propose new Gaussian models of high dimensional data for classification purposes. We assume that the data live in several groups located in subspaces of lower dimensions. Two different strategies arise:

the introduction in the model of a dimension reduction constraint for each group

the use of parsimonious models obtained by imposing to different groups to share the same values of some parameters
This modelling yields a new supervised classification method called High Dimensional Discriminant Analysis (HDDA) [4] . Some versions of this method have been tested on the supervised classification of objects in images. This approach has been adapted to the unsupervised classification framework, and the related method is named High Dimensional Data Clustering (HDDC) [3] . Our recent work consists in adding a kernel in the previous methods to deal with nonlinear data classification and heterogeneous data [12] . We also investigate the use of kernels derived from similary measures on binary data. The targeted application is the analysis of verbal autopsy data (PhD thesis of N. Sylla): Indeed, health monitoring and evaluation make more and more use of data on causes of death from verbal autopsies in countries which do not keep records of civil status or with incomplete records. The application of verbal autopsy method allows to discover probable cause of death. Verbal autopsy has become the main source of information on causes of death in these populations.
Location and scale mixtures of Gaussians with flexible tail behaviour: properties, inference and application to multivariate clustering
Participant : Florence Forbes.
Joint work with: Darren Wraith from QUT, Brisbane Australia.
Clustering concerns the assignment of each of $N$, possibly multidimensional, observations ${y}_{1},...,{y}_{N}$ to one of $K$ groups. A popular way to approach this task is via a parametric finite mixture model. While the vast majority of the work on such mixtures has been based on Gaussian mixture models in many applications the tails of normal distributions are shorter than appropriate or parameter estimations are affected by atypical observations (outliers). The family of location and scale mixtures of Gaussians has the ability to generate a number of flexible distributional forms. It nests as particular cases several important asymmetric distributions like the Generalised Hyperbolic (GH) distribution. The Generalised Hyperbolic distribution in turn nests many other well known distributions such as the Normal Inverse Gaussian (NIG) whose practical relevance has been widely documented in the literature. In a multivariate setting, we propose to extend the standard location and scale mixture concept into a so called multiple scaled framework which has the advantage of allowing different tail and skewness behaviours in each dimension of the variable space with arbitrary correlation between dimensions. The approach builds upon, and develops further, previous work on scale mixtures of Gaussians [21] . Estimation of the parameters is provided via an EM algorithm with a particular focus on NIG distributions. Inference is then extended to cover the case of mixtures of such multiple scaled distributions for application to clustering. Assessments on simulated and real data confirm the gain in degrees of freedom and flexibility in modelling data of varying tail behaviour and directional shape. In addition, comparison with other similar models of GH distributions shows that the later are not as flexible as claimed.
Bayesian mixtures of multiple scaled distributions
Participants : Florence Forbes, Alexis Arnaud.
Joint work with: Emmanuel Barbier and Benjamin Lemasson from Grenoble Institute of Neuroscience.
In previous work [21] , inference for mixtures of multiple scaled distributions has been carried out based on maximum likelihood principle and using the EM algorithm. In this work we consider a Bayesian treatment of these models for the many advantages that the Bayesian framework offers in the mixture model context. Mainly it avoids the illposed nature of maximum likelihood due to the presence of singularities in the likelihood function. A mixture component may collapse by becoming centered at a single data vector sending its covariance to 0 and the model likelihood to infinity. A Bayesian treatment protects the algorithm from this problem occurring in ordinary EM. Also, Bayesian model comparison embodies the principle that states that simple models should be preferred. Typically, maximum likelihood does not provide any guidance on the choice of the model order as more complex models can always fit the data better. For standard scale mixture of Gaussians, the usual NormalWishart prior can be used for the Gaussian parameters. For multiple scaled distributions, the specific decomposition of the covariance requires appropriate separated priors on the eigenvectors and eigenvalues of the scale matrix. Such a decomposition has been already examined in various works on priors for covariance matrix. In this work we consider several possibilities. We derive an inference scheme based on variational approximation and show how to apply this to model selection. In particular, we consider the issue of selecting automatically an appropriate number of classes in the mixtures. We show how to select this number from a single run avoiding the repetitive inference and comparison of all possible models.
EM for WeightedData Clustering
Participant : Florence Forbes.
Joint work with: Israel Gebru, Xavier AlamedaPined and Radu Horaud from the Inria Perception team.
Data clustering has received a lot of attention and many methods, algorithms and software packages are currently available. Among these techniques, parametric finitemixture models play a central role due to their interesting mathematical properties and to the existence of maximumlikelihood estimators based on expectationmaximization (EM). In this work we propose a new mixture model that associates a weight with each observed data point. We introduce a Gaussian mixture with weighted data and we derive two EM algorithms: the first one assigns a fixed weight to each observed datum, while the second one treats the weights as hidden variables drawn from gamma distributions. We provide a generalpurpose scheme for weight initialization and we thoroughly validate the proposed algorithms by comparing them with several parametric and nonparametric clustering techniques. We demonstrate the utility of our method for clustering heterogeneous data, namely data gathered with different sensorial modalities, e.g., audio and vision. See also an application in [40] .