PDF e-Pub

## Section: New Results

### Statistical learning methodology and theory

Participants : Gilles Celeux, Serge Cohen, Christine Keribin, Michel Prenat, Kaniav Kamary, Sylvain Arlot, Benjamin Auder, Jean-Michel Poggi, Neska El Haouij, Kevin Bleakley, Matthieu Lerasle.

Gilles Celeux and Serge Cohen have started research in collaboration with Agnès Grimaud (UVSQ) to perform clustering of hyperspectral images which respects spatial constraints. This is a one-class classification problem where distances between spectral images are given by the ${\chi }^{2}$ distance, while spatial homogeneity is associated with a single link distance.

Gilles Celeux continued his collaboration with Jean-Patrick Baudry on model-based clustering. This year, they started work on assessing model-based clustering methods on cytometry data sets. The interest of these is that they involve combining clustering and classification tasks in a unified framework.

Gillies Celeux and Julie Josse have started research on missing data for model-based clustering in collaboration with Christophe Biernacki (Modal, Inria Lille). This year, they have proposed a model for mixture analysis involving not missing-at-random mixtures.

In the framework of MASSICCC, Benjamin Auder and Gilles Celeux have started research on the graphical representation of model-based clusters. The aim of this is to better-display proximity between clusters.

For a long time unsolved, the consistency and asymptotic normality of the maximum likelihood and variational estimators of the latent block model were finally tackled and obtained in a joint work with V. Brault and M. Mariadassou.

J-M. Poggi (with R. Genuer, C. Tuleau-Malot, N. Villa-Vialaneix), have published an article on random forests in “big data” classification problems, and have performed a review of available proposals about random forests in parallel environments as well as on online random forests. Three variants involving subsampling, Big Data-bootstrap and MapReduce respectively were tested on two massive datasets, one simulated one, and the other, real-world data.

With G. Lecué, Matthieu Lerasle worked on robust machine learning by median-of-means, providing an alternative to the Lugosi and Mendelson approach based on median of means for learning. This alternative is easier to present and to analyse theoretically. Furthermore, they proposed an algorithm to approximate this estimator, which could not be done for Lugosi and Mendelson's champions of tournaments (submitted).