Section: New Results
Mixture models
Taking into account the curse of dimensionality.
Participant : Stéphane Girard.
Joint work with:Charles Bouveyron (Université Paris 1), Gilles Celeux (Select, INRIA) and Cordelia Schmid (Lear, INRIA).
In the PhD work of Charles Bouveyron (coadvised by Cordelia Schmid from the INRIA team LEAR) [28] , we propose new Gaussian models of high dimensional data for classification purposes. We assume that the data live in several groups located in subspaces of lower dimensions. Two different strategies arise:

the introduction in the model of a dimension reduction constraint for each group,

the use of parsimonious models obtained by imposing to different groups to share the same values of some parameters.
This modelling yields a new supervised classification method called HDDA for High Dimensional Discriminant Analysis [Oops!] . Some versions of this method have been tested on the supervised classification of objects in images. This approach has been adapted to the unsupervised classification framework, and the related method is named HDDC for High Dimensional Data Clustering [Oops!] . In collaboration with Gilles Celeux and Charles Bouveyron we are currently working on the automatic selection of the discrete parameters of the model. We also, in the context of Juliette Blanchet PhD work (also coadvised by C. Schmid), combined the method to our Markovmodel based approach of learning and classification and obtained significant improvement in applications such as texture recognition where the observations are highdimensional.
We are then also willing to get rid of the Gaussian assumption. To this end, non linear models and semiparametric methods are necessary.
Multispeaker Localization with Binaural Audition and Stereo Vision using the EM Algorithm
Participants : Florence Forbes, Vasil Khalidov.
Joint work with:Elise Arnaud, Miles Hansard, Radu Horaud and Ramya Narasimha from the INRIA team Perception.
This work takes place in the context of the POP European project (see Section 8.3.1 ) and includes further collaborations with researchers from University of Sheffield, UK. The context is that of multimodal sensory signal integration. We focus on audiovisual integration. Fusing information from audio and video sources has resulted in improved performance in applications such as tracking. However, crossmodal integration is not trivial and requires some cognitive modelling because at a lower level, there is no obvious way to associate depth and sound sources. Combining expertise from team Perception and University of Sheffield, we address the difficult problems of integrating spatial and temporal audiovisual stimuli using a geometrical and probabilistic framework and attack the problem of associating sensorial descriptions with representation of prior knowledge.
First, we address the problem of speaker localization within an unsupervised modelbased clustering framework. Both auditory and visual observations are available. We gather observations over a time interval [ t_{1}, t_{2}] . We assume that within this time interval the speakers are static so that each speaker can be described by its 3D location in space. A cluster is associated with each speaker. In practice we consider N+ 1 possible clusters corresponding to the addition of an extra outlier category to the N speakers.
We then consider then a set of M visual observations. Each such observation corresponds to a binocular disparity, namely a 3D vector where u_{m} and v_{m} correspond to the 2D location in the Cyclopean image (The Cyclopean image is a geometric construction developped by M. Hansard and R. Horaud), and d_{m} denotes the measured disparity at this image location. Note that such a binocular disparity corresponds to the location of a physical object that is visible in both the left and right images of the stereo pair. We define a function such that represents the binocular disparity of speaker n when his location is given by .
Similarly, let us consider a set of K auditory observations. Each such observation corresponds to an auditory disparity, namely the interaural time difference , or ITD. We define a function such that evaluates the ITD of speaker n given his coordinates in the 3D space.
We then show that recovering speakers localizations can be seen as a parameter estimation issue in a missing data framework. The parameters to be estimated are the speaker locations, and the missing variables are the assignement variables associating each individual observations to one of the N speakers ot to the outlier class. We are currently investigating the use of the EM algorithm to provide these parameters estimates.