Section: New Results
Mixture models
Taking into account the curse of dimensionality.
Participant : Stéphane Girard.
Joint work with: Bouveyron, C (Université Paris 1) and Celeux, G. (Select, INRIA).
In the PhD work of Charles Bouveyron (co-advised by Cordelia Schmid from the INRIA team LEAR) [46] , we propose new Gaussian models of high dimensional data for classification purposes. We assume that the data live in several groups located in subspaces of lower dimensions. Two different strategies arise:
-
the introduction in the model of a dimension reduction constraint for each group,
-
the use of parsimonious models obtained by imposing to different groups to share the same values of some parameters.
This modelling yields a new supervised classification method called HDDA for High Dimensional Discriminant Analysis [4] . Some versions of this method have been tested on the supervised classification of objects in images. This approach has been adapted to the unsupervised classification framework, and the related method is named HDDC for High Dimensional Data Clustering [3] . An introductory paper to these classification methods [23] has be written in order to disseminate them into various application domains. Another part of the work of Charles Bouveyron and Stéphane Girard has consisted in extending these methods to the semi-supervised context or to the presence of label noise [14] . In collaboration with Gilles Celeux and Charles Bouveyron we are currently working on the automatic selection of the discrete parameters of the model.
Conjugate mixture model for clustering multimodal data
Participants : Florence Forbes, Vasil Khalidov.
Joint work with: Radu Horaud from the INRIA team Perception.
This work was initiated in the European STREP POP (Perception On Purpose-2006-2008) coordinated by Radu Horaud. We addressed the issue of clustering observations that are gathered using multiple measuring instruments, e.g. using several physically different sensors. A typical such issue, that we addressed is audio-visual speaker detection.
When the data originates from a single speaker/object, finding the best estimates for the objects characteristics is usually referred to as a pure fusion task and it reduces to combining multisensor observations in some optimal way. The problem is much more complex when several objects are present and when the task implies their detection, identification, and localization. In this case one has to consider two processes simultaneously: (i) segregation which assigns each observation either to an object or to an outlier category and (ii) estimation which computes the parameters of each object based on the group of observations that were assigned to that object. In other words, in addition to fusing observations from different sensors, multimodal analysis requires the assignment of each observation to one of the objects.
This observation-to-object association problem can be cast into a probabilistic framework. In the case of unimodal data (possibly multidimensional), the problems of grouping observations and of associating groups with objects can be cast into the framework of standard data clustering. The problem of clustering multimodal data raises the difficult question of how to group together observations that belong to different physical spaces with different dimensionalities, e.g., how to group visual data with auditory data? When the observations from two different modalities can be aligned pairwise, a natural solution is to consider the Cartesian product of two unimodal spaces. Unfortunately, such an alignment is not possible in most practical cases. Different sensors operate at different frequency rates and hence the number of observations gathered with one sensor can be quite different from the number of observations gathered with another sensor. Consequently, there is no obvious way to align the observations pairwise. Considering all possible pairs would result in a combinatorial blow-up and typically create abundance of erroneous observations corresponding to inconsistent solutions. Alternatively, one may consider several unimodal clusterings, provided that the relationships between a common object space and several observation spaces can be explicitly specified. Multimodal clustering then results in a number of unimodal clusterings that are jointly governed by the same unknown parameters characterizing the object space.
In a recent submitted paper [42] , we show how the problem of clustering multimodal data can be addressed within the framework of mixture models. The proposed model is composed of a number of modality-specific mixtures. These mixtures are jointly governed by a set of common object-space parameters (which are referred to as the tying parameters), thus insuring consistency between the sensory data and the object space being sensed. This is done using explicit transformations from the unobserved parameter space (object space) to each of the observed spaces (sensor spaces). Hence, the proposed model is able to deal with observations that live in spaces with different physical properties such as dimensionality, space metric, sensor sampling rate, etc. We believe that linking the object space with the sensor spaces based on object-space-to-sensor-space transformations has more discriminative power than existing multisensor fusion techniques and hence performs better in terms of multiple object identification and localization. To the best of our knowledge, there has been no attempt to use a generative model, such as ours, for the task of multimodal data interpretation. The concept of conjugate mixture models is described in more details in our paper [42] . Standard Gaussian mixture models (GMM) are used to model the unimodal data. The parameters of these Gaussian mixtures are governed by the object parameters through a number of object-space-to-sensor-space transformations (one transformation for each sensing modality). A very general class of transformations, namely non-linear Lipschitz continuous functions is assumed. Figure 2 shows a graphical representation of our conjugate mixture models.
|
Rigid and Articulated Point Registration with Expectation Conditional Maximization
Participant : Florence Forbes.
Joint work with: Radu Horaud from the INRIA team Perception and Manuel Iguel from team Emotion.
In image analysis and computer vision there is a long tradition of algorithms for finding an optimal alignment between two sets of points. This is referred to as the point registration (PR) problem, which is twofold: (i) Find point-to-point correspondences and (ii) estimate the transformation allowing the alignment of the two sets. Existing PR methods can be roughly divided into three categories: The Iterative Closest Point (ICP) algorithm and its numerous extensions, soft assignment methods and probabilistic methods to cite just a few. Probabilistic point registration uses, in general, Gaussian mixture models (GMM). Indeed, one may reasonably assume that points from the first set (the data) are normally distributed around points belonging to the second set (the model). Therefore, the point-to-point assignment problem can be recast into that of estimating the parameters of a mixture. This can be done within the framework of maximum likelihood with missing data because one has to estimate the mixture parameters as well as the point-to-cluster assignments, i.e., the missing data. In this case the algorithm of choice is the expectation-maximization (EM) algorithm. Formally, the latter replaces the maximization of the observed-data log-likelihood with the maximization of the expected complete-data log-likelihood conditioned by the observations . As explained in detail in [41] , there are intrinsic difficulties when one wants to cast the PR problem in the EM framework. The main topic and contribution of our work [41] is to propose an elegant and efficient way to do that. We introduce an innovative EM-like algorithm, namely the Expectation Conditional Maximization for Point Registration (ECMPR) algorithm. The algorithm allows the use of general covariance matrices for the mixture model components and improves over the isotropic covariance case. We analyse in detail the associated consequences in terms of estimation of the registration parameters, and we propose an optimal method for estimating the rotational and translational parameters based on semi-definite positive relaxation . We extend rigid registration to articulated registration. Robustness is ensured by detecting and rejecting outliers through the addition of a uniform component to the Gaussian mixture model at hand. We provide an in-depth analysis of our method and we compare it both theoretically and experimentally with other robust methods for point registration.