Section: New Results
Supervised classification of complex structure data using mixture models and triplet Markov fields.
In this work, we focus on three sources of complexity. We consider data exhibiting (complex) dependence structures, having to do for example with spatial or temporal association, family relationship, and so on. More specifically, we consider observations associated to sites or items at spatial locations. These locations can be irregularly spaced. This goes beyond the standard regular lattice case traditionnaly used in image analysis and requires some adaptation.
A second source of complexity is connected with the measurement process, such as having multiple measuring instruments or computations generating high dimensional data. There are not so many 1-dimensional distributions for continuous variables that generalize to multidimensional ones except when considering product of 1-dimensional independent components. The Gaussian distribution is the most commonly used but it has the specificity to be unimodal. Also, what we consider as a third source of complexity is that in real-world applications, data cannot usually be reduced to classes modeled by unimodal distributions and consequently by single gaussian distributions.
In this work, we consider supervised classification problems in which training sets are available and correspond to data for which data exemplars have been grouped into classes.
We propose a unified Markovian framework for both learning the class models and then consequently classify observed data into these classes. We show that models able to deal with the above sources of complexity can be derived based on traditional tools such as mixture models and Hidden Markov fields. For the latter, however, non trivial extensions in the spirit of  are required to include a learning step while preserving the Markovian modelling of the depedencies. Applications of our models include textured image segmentation. See an illustration in Figure 3 .
Integrated Markov models for clustering genes
Clustering of genes into groups sharing common characteristics is a useful exploratory technique for a number of subsequent computational and biological analysis. A wide range of clustering algorithms have been proposed in particular to analyze gene expression data but most of them consider genes as independent entities or include relevant information on gene interactions in a sub-optimal way.
We propose a probabilistic model that has the advantage to account for individual data (eg. expression) and pairwise data (eg. interaction information coming from biological networks) simultaneously. Our model is based on hidden Markov random field models in which parametric probability distributions account for the distribution of individual data. Data on pairs, possibly reflecting distances or similarity measures between genes, are then included through a graph where the nodes represent the genes and the edges are weighted according to the available interaction information. As a probabilistic model, this model has many interesting theoretical features. Also, preliminary experiments on simulated and real data show promising results and points out the gain in using such an approach  ,  ,  ,  .
Distributed and Cooperative Markovian segmentation of both tissues and structures in brain MRI.
Participant : Florence Forbes.
This is joint work with Benoit Scherrer, Michel Dojat and Christine Garbay from INSERM and LIG.
Accurate tissue and structure segmentation of MRI brain scan is critical for several applications. Markov random fields are commonly used for tissue segmentation to take into account spatial dependencies between voxels, hence acting as a labelling regularization. However, such a task requires the estimation of the model parameters (eg. Potts model) which is not tractable without approximations. The algorithms in  based on EM and variational approximations are considered. They show interesting results for tissue segmentation but are not sufficient for structure segmentation without introducing a priori anatomical knowledge. In most approaches, structure segmentation is performed after tissue segmentation. We suggest considering them as combined processes that cooperate. Brain anatomy is described by fuzzy spatial relations between structures that express general relative distances, orientations or symmetries. This knowledge is incorporated into a 2-class Markov model via an external field. This model is used for structure segmentation. The resulting structure information is then incorporated in turn into a 3 to 5-class Markov model for tissue segmentation via another specific external field. Tissue and structure segmentations thus appear as dynamical and cooperative MRF procedures whose performance increases gradually. This approach is implemented into a multi-agent framework, where autonomous entities, distributed into the image, estimate local Markov fields and cooperate to ensure consistency  ,  . We show, using phantoms and real images (acquired on a 3T scanner), that a distributed and cooperative Markov modelling using anatomical knowledge is a powerful approach for MRI brain scan segmentation (See Figure 4 ).
The current investigation concerns only one type (T1) of MR images with no temporal information. We are planning to extend our tools to include multidimensional MR sequences corresponding to other types of MR modalities and longitudinal data.
Modelling and inference of population structure from genetic and spatial data
This is joint work with Olivier François from team TimB in TIMC laboratory.
In applications of population genetics, it is often useful to classify individuals in a sample into populations which become then the units of interest. However, the definition of populations is typically subjective, based, for example, on linguistic, cultural, or physical characters as well as the geographic location of sampled individuals. Recently, Pritchard et al  , proposed a Bayesian approach to classify individuals into groups using genotype data. Such data, also called multilocus genotype data, consists of several genetic markers whose variations are measured at a series of loci for each sampled individual. Their method is based on a parametric model (model-based clustering) in which there are K groups (where K may be unknown), each of which is characterized by a set of allele frequencies at each locus. Group allele frequencies are unknown and modeled by a Dirichlet distribution at each locus within each group. A MCMC algorithm is then used to estimate simultaneously assignment probabilities and allele frequencies for all groups. In such a model, individuals are assumed to be independent, which does not take into account their possible spatial proximity.
The main goal of this work is to introduce spatial prior models and to assess their role in accounting for the relationships between individuals. In this perspective, we propose to investigate particular Markov models on graphs and to evaluate the quality of mean field approximations for the estimation of their parameters.
Maximum likelihood estimation of such models in a spatial context is typically intractable but mean field like approximations within an EM algorithm framework, in the spirit of  will be considered to deal with this problem. This should result in a procedure alternative to MCMC approaches. With this in mind, we first considered the EM approach in a non spatial case, as an alternative to the traditional Bayesian approaches. The corresponding new computer program (see Section 5.4 ) and promising results are reported in  .
Statistical methods for the visualization and analysis of complex remote sensing data
This is joint work with Sylvain Douté and Etienne Deforas from Laboratoire de Planétologie de Grenoble, France.
Visible and near infrared imaging spectroscopy is one of the key techniques to detect, to map and to characterize mineral and volatile (eg. water-ice) species existing at the surface of the planets. Indeed the chemical composition, granularity, texture, physical state, etc. of the materials determine the existence and morphology of the absorption bands. The resulting spectra contain therefore very useful information. Current imaging spectrometers provide data organized as three dimensional hyperspectral images: two spatial dimensions and one spectral dimension.
A new generation of imaging spectrometers is emerging with an additional angular dimension. The surface of the planets will now be observed from different view points on the satellite trajectory, corresponding to about ten different angles, instead of only one corresponding usually to the vertical (0 degree angle) view point. Multi-angle imaging spectrometers present several advantages: the influence of the atmosphere on the signal can be better identified and separated from the surface signal on focus, the shape and size of the surface components and the surfaces granularity can be better characterized.
However, this new generation of spectrometers also results in a significant increase in the size (several tera-bits expected) and complexity of the generated data. Consequently, HMA (Hyperspectral Multi Angular) data induce data manipulation and visualization problems due to its size and its 4 dimensionality.
We propose to investigate the use of statistical techniques to deal with these generic sources of complexity in data beyond the traditional tools in mainstream statistical packages. Our goal is twofold:
we first focus on developing or adapting dimension reduction methods, classification and segmentation methods for informative, useful visualization and representation of the data previous to its subsequent analysis.
We also address the problem of physical model inversion which is important to understand the complex underlying physics of the HMA signal formation. The models taking into account the angular dimension result in more complex treatments. We investigate the use of semiparametric dimension reduction methods such as SIR (Sliced Inverse Regression,  ) to perform model inversion, in a reasonable computing time, when the number of input observations increases considerably.
The first data set under consideration (hyperspectral images with vertical pointing) comes from the Mars-Express Mission operated by the European Space Agency. The second data set (multi-angular hyperspectral images) will be generated by the CRISM instrument of the Mars Reconnaissance Orbiter (NASA) that has started its scientific activities in June 2006 after orbit insertion. LPG is a co-investigator of the CRISM instrument.