Project Team Mistis

Members
Overall Objectives
Scientific Foundations
Partnerships and Cooperations
Dissemination
Bibliography

## Section: Software

### The SpaCEM${}^{3}$ program

Participants : Lamiae Azizi, Senan James Doyle, Florence Forbes.

SpaCEM${}^{3}$ (Spatial Clustering with EM and Markov Models) is a software that provides a wide range of supervised or unsupervised clustering algorithms. The main originality of the proposed algorithms is that clustered objects do not need to be assumed independent and can be associated with very high-dimensional measurements. Typical examples include image segmentation where the objects are the pixels on a regular grid and depend on neighbouring pixels on this grid. More generally, the software provides algorithms to cluster multimodal data with an underlying dependence structure accounting for some spatial localisation or some kind of interaction that can be encoded in a graph.

This software, developed by present and past members of the team, is the result of several research developments on the subject. The current version 2.09 of the software is CeCILLB licensed.

Main features. The approach is based on the EM algorithm for clustering and on Markov Random Fields (MRF) to account for dependencies. In addition to standard clustering tools based on independent Gaussian mixture models, SpaCEM${}^{3}$ features include:

• The unsupervised clustering of dependent objects. Their dependencies are encoded via a graph not necessarily regular and data sets are modelled via Markov random fields and mixture models (eg. MRF and Hidden MRF). Available Markov models include extensions of the Potts model with the possibility to define more general interaction models.

• The supervised clustering of dependent objects when standard Hidden MRF (HMRF) assumptions do not hold (ie. in the case of non-correlated and non-unimodal noise models). The learning and test steps are based on recently introduced Triplet Markov models.

• Selection model criteria (BIC, ICL and their mean-field approximations) that select the "best" HMRF according to the data.

• The possibility of producing simulated data from:

• general pairwise MRF with singleton and pair potentials (typically Potts models and extensions)

• standard HMRF, ie. with independent noise model

• general Triplet Markov models with interaction up to order 2

• A specific setting to account for high-dimensional observations.

• An integrated framework to deal with missing observations, under Missing At Random (MAR) hypothesis, with prior imputation (KNN, mean, etc), online imputation (as a step in the algorithm), or without imputation.

The software is available at http://spacem3.gforge.inria.fr . A user manual in English is available on the web site above together with example data sets. The INRA Toulouse unit is more recently participating to this project for promotion among the bioinformatics community [20] .