The mistis team aims at developing statistical methods for dealing with complex problems or data. Our applications relate mainly to image processing and spatial data problems with some applications in environment, biology and medicine. Our approach is based on the statement that complexity can be handled by working up from simple local assumptions in a coherent way, defining a structured model, and that is the key to modelling, computation, inference and interpretation. The methods we focus on involve mixture models, Markov models, and, more generally, hidden structure models identified by deterministic or stochastic algorithms on one hand, and semi and non-parametric methods on the other hand.

Hidden structure models are useful for taking into account heterogeneity in data. They concern many areas of statistical methodology (finite mixture analysis, hidden Markov models, random effect models, etc). Due to their missing data structure, they induce specific difficulties for both estimating the model parameters and assessing performance. The team focuses on research regarding both aspects. We design specific algorithms for estimating the parameters of missing structure models and we propose and study specific criteria for choosing the most relevant missing structure models in several contexts.

Semi- and non-parametric methods are relevant and useful when no appropriate parametric model exists for the data under study either because of data complexity, or because information is missing. The focus is on functions describing curves or surfaces or more generally manifolds rather than real valued parameters. This can be interesting in image processing for instance where it can be difficult to introduce parametric models that are general enough (e.g. for contours).

The European project HUMAVIPS – Humanoids with Auditory and Visual Abilities in Populated Spaces – is a 36-month FP7 STREP project coordinated by Radu Horaud and which started in 2010. The project addressed multimodal perception and cognitive issues associated with the computational development of a social robot. The objective was to endow humanoid robots with audiovisual (AV) abilities: exploration, recognition, and interaction, such that they exhibit adequate behavior when dealing with a group of people. Research and technological developments emphasized the role played by multimodal perception within principled models of human-robot interaction and of humanoid behavior. The HUMAVIPS project was successfully terminated in January 2013.

An article about *Integrating Smart Robots into Society* refers to HUMAVIPS. The article stresses the role of cognition in human-robot interaction and refers to HUMAVIPS as one of the FP7 projects that has paved the way towards the concept of audio-visual robotics.
The article was published in HORIZON, which is Europe's Research & Innovation Magazine.

The paper addresses the problem of aligning visual and auditory data using a sensor that is composed of a camera-pair and a microphone-pair. The original contribution of the paper is a method for audio-visual data aligning through estimation of the 3D positions of the microphones in the visual centred coordinate frame defined by the stereo camera-pair. Please consult http://

**Key-words:**
mixture of distributions, EM algorithm, missing data, conditional independence,
statistical pattern recognition, clustering,
unsupervised and partially supervised learning.

In a first approach, we consider statistical parametric models,

These models are interesting in that they may point out hidden
variable responsible for most of the observed variability and so
that the observed variables are *conditionally* independent.
Their estimation is often difficult due to the missing data. The
Expectation-Maximization (EM) algorithm is a general and now
standard approach to maximization of the likelihood in missing
data problems. It provides parameter estimation but also values
for missing data.

Mixture models correspond to independent

**Key-words:**
graphical models, Markov properties, hidden Markov models, clustering, missing data, mixture of distributions, EM algorithm, image analysis, Bayesian
inference.

Graphical modelling provides a diagrammatic representation of the dependency structure of a joint probability distribution, in the form of a network or graph depicting the local relations among variables. The graph can have directed or undirected links or edges between the nodes, which represent the individual variables. Associated with the graph are various Markov properties that specify how the graph encodes conditional independence assumptions.

It is the conditional independence assumptions that give graphical models their fundamental modular structure, enabling computation of globally interesting quantities from local specifications. In this way graphical models form an essential basis for our methodologies based on structures.

The graphs can be either
directed, e.g. Bayesian Networks, or undirected, e.g. Markov Random Fields.
The specificity of Markovian models is that the dependencies
between the nodes are limited to the nearest neighbor nodes. The
neighborhood definition can vary and be adapted to the problem of
interest. When parts of the variables (nodes) are not observed or missing,
we
refer to these models as Hidden Markov Models (HMM).
Hidden Markov chains or hidden Markov fields correspond to cases where the

Hidden Markov models are very useful in modelling spatial dependencies but these dependencies and the possible existence of hidden variables are also responsible for a typically large amount of computation. It follows that the statistical analysis may not be straightforward. Typical issues are related to the neighborhood structure to be chosen when not dictated by the context and the possible high dimensionality of the observations. This also requires a good understanding of the role of each parameter and methods to tune them depending on the goal in mind. Regarding estimation algorithms, they correspond to an energy minimization problem which is NP-hard and usually performed through approximation. We focus on a certain type of methods based on variational approximations and propose effective algorithms which show good performance in practice and for which we also study theoretical properties. We also propose some tools for model selection. Eventually we investigate ways to extend the standard Hidden Markov Field model to increase its modelling power.

**Key-words:** dimension reduction, extreme value analysis, functional estimation.

We also consider methods which do not assume a parametric model.
The approaches are non-parametric in the sense that they do not
require the assumption of a prior model on the unknown quantities.
This property is important since, for image applications for
instance, it is very difficult to introduce sufficiently general
parametric models because of the wide variety of image contents.
Projection methods are then a way to decompose the unknown
quantity on a set of functions (*e.g.* wavelets). Kernel
methods which rely on smoothing the data using a set of kernels
(usually probability distributions) are other examples.
Relationships exist between these methods and learning techniques
using Support Vector Machine (SVM) as this appears in the context
of *level-sets estimation* (see section ). Such
non-parametric methods have become the cornerstone when dealing
with functional data . This is the case, for
instance, when observations are curves. They enable us to model the
data without a discretization step. More generally, these
techniques are of great use for *dimension reduction* purposes
(section ). They enable reduction of the dimension of the
functional or multivariate data without assumptions on the
observations distribution. Semi-parametric methods refer to
methods that include both parametric and non-parametric aspects.
Examples include the Sliced Inverse Regression (SIR) method
which combines non-parametric regression techniques
with parametric dimension reduction aspects. This is also the case
in *extreme value analysis* , which is based
on the modelling of distribution tails (see section ).
It differs from traditional statistics which focuses on the central
part of distributions, *i.e.* on the most probable events.
Extreme value theory shows that distribution tails can be
modelled by both a functional part and a real parameter, the
extreme value index.

Extreme value theory is a branch of statistics dealing with the extreme
deviations from the bulk of probability distributions.
More specifically, it focuses on the limiting distributions for the
minimum or the maximum of a large collection of random observations
from the same arbitrary distribution.
Let *i.e.*

To estimate such quantiles therefore requires dedicated
methods to
extrapolate information beyond the observed values of

where both the extreme-value index *i.e.* such that

for all

More generally, the problems that we address are part of the risk management theory. For instance, in reliability, the distributions of interest are included in a semi-parametric family whose tails are decreasing exponentially fast. These so-called Weibull-tail distributions are defined by their survival distribution function:

Gaussian, gamma, exponential and Weibull distributions, among others,
are included in this family. An important part of our work consists
in establishing links between models () and ()
in order to propose new estimation methods.
We also consider the case where the observations were recorded with a covariate information. In this case, the extreme-value index and the

Level sets estimation is a
recurrent problem in statistics which is linked to outlier
detection. In biology, one is interested in estimating reference
curves, that is to say curves which bound

Our work on high dimensional data requires that we face the curse of dimensionality phenomenon. Indeed, the modelling of high dimensional data requires complex models and thus the estimation of high number of parameters compared to the sample size. In this framework, dimension reduction methods aim at replacing the original variables by a small number of linear combinations with as small as a possible loss of information. Principal Component Analysis (PCA) is the most widely used method to reduce dimension in data. However, standard linear PCA can be quite inefficient on image data where even simple image distorsions can lead to highly non-linear data. Two directions are investigated. First, non-linear PCAs can be proposed, leading to semi-parametric dimension reduction methods . Another field of investigation is to take into account the application goal in the dimension reduction step. One of our approaches is therefore to develop new Gaussian models of high dimensional data for parametric inference . Such models can then be used in a Mixtures or Markov framework for classification purposes. Another approach consists in combining dimension reduction, regularization techniques, and regression techniques to improve the Sliced Inverse Regression method .

As regards applications, several areas of image analysis can be covered using the tools developed in the team. More specifically, in collaboration with team Perception, we address various issues in computer vision involving Bayesian modelling and probabilistic clustering techniques. Other applications in medical imaging are natural. We work more specifically on MRI data, in collaboration with the Grenoble Institute of Neuroscience (GIN) and the NeuroSpin center of CEA Saclay. We also consider other statistical 2D fields coming from other domains such as remote sensing, in collaboration with Laboratoire de Planétologie de Grenoble. In the context of the ANR MDCO project Vahine, we worked on hyperspectral multi-angle images. In the context of the "pole de competivite" project I-VP, we worked of images of PC Boards.

A second domain of applications concerns biology and medicine. We consider the use of missing data models in epidemiology. We also investigated statistical tools for the analysis of bacterial genomes beyond gene detection. Applications in population genetics and neurosiences are also considered. Finally, in the context of the ANR VMC project Medup, we studied the uncertainties on the forecasting and climate projection for Mediterranean high-impact weather events.

**Joint work with:** Michel Dojat from Grenoble Institute of Neuroscience and Benoit Scherrer from Harvard Medical School, Boston, MA, USA.

From brain MR images, neuroradiologists are able to delineate
tissues such as grey matter and structures such as Thalamus and
damaged regions. This delineation is a common task for an expert
but unsupervised segmentation is difficult due to a number of
artefacts. The LOCUS software (http://

The LOCUS software has been developed in the context of a collaboration between Mistis, a computer science team (Magma, LIG) and a Neuroscience methodological team (the Neuroimaging team from Grenoble Institut of Neurosciences, INSERM). This collaboration resulted over the period 2006-2008 into the PhD thesis of B. Scherrer (advised by C. Garbay and M. Dojat) and in a number of publications. In particular, B. Scherrer received a "Young Investigator Award" at the 2008 MICCAI conference.

The originality of this work comes from the successful combination of the teams respective strengths i.e. expertise in distributed computing, in neuroimaging data processing and in statistical methods.

**Joint work with:** Michel Dojat.

The Locus software was extended to address the delineation of lesions in pathological brains. Its extension
P-LOCUS (http://

it is fully automatic: no external user interaction and no training data required

the possibility to combine information from several images (MR sequences)

a statistical Bayesian framework for robustness to image artefacts and a priori knowledge incorporation

a voxel-based clustering technique that uses Markov random fields (MRF) incorporating information about neighboring voxels for spatial consistency and robustness to imperfect image features (noise).

the possibility to select and incorporate relevant a priori knowledge via different atlases, e.g. tissue and vascular territory atlases

a fully integrated preprocessing steps and lesion ROI identification

P-LOCUS software was presented at various conferences and used for the BRATS Challenge on tumor segmentation organized as a satellite challenge of the Miccai conference in Nagoya, Japan. A paper submitted to IEEE trans. on Medical Imaging reports the challenge results .

**Joint work with:** Philippe Ciuciu and Solveig Badillo from Parietal Team Inria and CEA NeuroSpin, Lotfi Chaari and Laurent Risser from Toulouse University.

As part of fMRI data analysis, the PyHRF package (http://

**Joint work with:**
Emma Holian (National University of Ireland, Galway)

In studies where subjects contribute more than one observation, such as in longitudinal studies, linear mixed models have become one of the most used techniques to take into account the correlation between these observations. By introducing random effects, mixed models allow the within-subject correlation and the variability of the response among the different subjects to be taken into account. However, such models are based on a normality assumption for the random effects and reflect the prior belief of homogeneity among all the subjects. To relax this strong assumption, Verbeke and Lesaffre (1996) proposed the extension of the classical linear mixed model by allowing the random effects to be sampled from a finite mixture of normal distributions with common covariance matrix. This extension naturally arises from the prior belief of the presence of unobserved heterogeneity in the random effects population. The model is therefore called the heterogeneity linear mixed model. Note that this model does not only extend the assumption about the random effects distribution, indeed, each component of the mixture can be considered as a cluster containing a proportion of the total population. Thus, this model is also suitable for classification purposes.

Concerning parameter estimation in the heterogeneity model, the use of the EM-algorithm, which takes into account the incomplete structure of the data, has been considered in the literature. Unfortunately, the M-step in the estimation process is not available in analytic form and a numerical maximisation procedure such as Newton-Raphson is needed. Because deriving such a procedure is a non-trivial task, Komarek et al. (2002) proposed an approximate optimization. But this procedure proved to be very slow and limited to small samples due to requiring manipulation of very large matrices and prohibitive computation.

To overcome this problem, we have proposed in , an alternative approach which consists of fitting directly an equivalent mixture of linear mixed models. Contrary to the heterogeneity model, the M-step of the EM-algorithm is tractable analytically in this case. Then, from the obtained parameter estimates, we can easily obtain the parameter estimates in the heterogeneity model.

**Joint work with:** C. Bouveyron (Univ. Paris 1), M. Fauvel (ENSAT Toulouse)
and J. Chanussot (Gipsa-lab and Grenoble-INP)

In the PhD work of Charles Bouveyron (co-advised by Cordelia Schmid from the Inria LEAR team) , we propose new Gaussian models of high dimensional data for classification purposes. We assume that the data live in several groups located in subspaces of lower dimensions. Two different strategies arise:

the introduction in the model of a dimension reduction constraint for each group

the use of parsimonious models obtained by imposing to different groups to share the same values of some parameters

This modelling yields a new supervised classification method called High Dimensional Discriminant Analysis (HDDA) . Some versions of this method have been tested on the supervised classification of objects in images. This approach has been adapted to the unsupervised classification framework, and the related method is named High Dimensional Data Clustering (HDDC) . Our recent work consists in adding a kernel in the previous methods to deal with nonlinear data classification.

Clustering concerns the assignment of each of

**Joint work with:** Antoine Deleforge and Radu Horaud from the
Inria Perception team.

In this work we address the problem of approximating high-dimensional data with a low-dimensional representation. We make the following contributions. We propose an inverse regression method which exchanges the roles of input and response, such that the low-dimensional variable becomes the regressor, and which is tractable. We introduce a mixture of locally-linear probabilistic mapping model that starts with estimating the parameters of inverse regression, and follows with inferring closed-form solutions for the forward parameters of the high-dimensional regression problem of interest. Moreover, we introduce a partially-latent paradigm, such that the vector-valued response variable is composed of both observed and latent entries, thus being able to deal with data contaminated by experimental artifacts that cannot be explained with noise models. The proposed probabilistic formulation could be viewed as a latent-variable augmentation of regression. We devise expectation-maximization (EM) procedures based on a data augmentation strategy which facilitates the maximum-likelihood search over the model parameters. We propose two augmentation schemes and we describe in detail the associated EM inference procedures that may well be viewed as generalizations of a number of EM regression, dimension reduction, and factor analysis algorithms. The proposed framework is validated with both synthetic and real data. We provide experimental evidence that our method outperforms several existing regression techniques.

**Joint work with:** Antoine Deleforge and Radu Horaud from the
Inria Perception team.

In this paper we address the problems of modeling the acoustic space generated by a full-spectrum sound source and of using the learned model for the localization and separation of multiple sources that simultaneously emit sparse-spectrum sounds. We lay theoretical and methodological grounds in order to introduce the *binaural manifold* paradigm. We perform an in-depth study of the latent low-dimensional structure of the high-dimensional interaural spectral data, based on a corpus recorded with a human-like audiomotor robot head. A non-linear dimensionality reduction technique is used to show that these data lie on a two-dimensional (2D) smooth manifold parameterized by the motor states of the listener, or equivalently, the sound source directions. We propose a *probabilistic piecewise affine mapping* model (PPAM) specifically designed to deal with high-dimensional data exhibiting an intrinsic piecewise linear structure. We derive a closed-form expectation-maximization (EM) procedure for estimating the model parameters, followed by Bayes inversion for obtaining the full posterior density function of a sound source direction. We extend this solution to deal with missing data and redundancy in real world spectrograms, and hence for 2D localization of natural sound sources such as speech. We further generalize the model to the challenging case of multiple sound sources and we propose a variational EM framework. The associated algorithm, referred to as *variational EM for source separation and localization* (VESSL) yields a Bayesian estimation of the 2D locations and time-frequency masks of all the sources. Comparisons of the proposed approach with several existing methods reveal that the combination of acoustic-space learning with Bayesian inference enables our method to outperform state-of-the-art methods.

**Joint work with:** Philippe Ciuciu from Team Parietal and
Neurospin, CEA in Saclay.

Standard detection of evoked brain activity in functional MRI (fMRI) relies on a fixed and known shape of the impulse response of the neurovascular coupling, namely the hemodynamic response function (HRF). To cope with this issue, the joint detection-estimation (JDE) framework has been proposed. This formalism enables to estimate a HRF per region but for doing so, it assumes a prior brain partition (or parcellation) regarding hemodynamic territories (eg. ). This partition has to be accurate enough to recover accurate HRF shapes but has also to overcome the detection-estimation issue: the lack of hemodynamics information in the non-active positions. During the internship of A. Frau Pascual at Neurospin, we proposed an hemodynamically-based parcellation, consisting first of a feature extraction step, followed by a Gaussian Mixture-based parcellation, which considers the injection of the activation levels in the parcellation process, in order to overcome the detection-estimation issue and find the underlying hemodynamics. The work has been submitted to the ICASSP conference in 2014.

**Joint work with:** Michel Dojat (Grenoble Institute of
Neuroscience) and Philippe Ciuciu from Neurospin, CEA in Saclay.

Brain functional exploration investigates the nature of neural processing following cognitive or sensory stimulation. This goal is not fully accounted for in most functional Magnetic Resonance Imaging (fMRI) analysis which usually assumes that all delivered stimuli possibly generate a BOLD response everywhere in the brain although activation is likely to be induced by only some of them in specific brain regions. Generally, criteria are not available to select the relevant conditions or stimulus types (e.g. visual, auditory, etc.) prior to activation detection and the inclusion of irrelevant events may degrade the results, particularly when the Hemodynamic Response Function (HRF) is jointly estimated as in the JDE framework mentioned in the previous section. To face this issue, we propose an efficient variational procedure that automatically selects the conditions according to the brain activity they elicit. It follows an improved activation detection and local HRF estimation that we illustrate on synthetic and real fMRI data. This approach is an alternative to our previous approach based on Monte-Carlo Markov Chain (MCMC) inference . Corresponding papers , . A synthesis can also be found in the PhD manuscript of C. Bakhous (Grenoble University, December 2013) .

**In the context of ARC AINSI project, joint work with:**
Philippe Ciuciu from Neurospin, CEA in Saclay.

Functional MRI (fMRI) is the method of choice to non-invasively probe cerebral activity evoked by a set of controlled experimental conditions. A rising fMRI modality is Arterial Spin Labeling (ASL) which enables to quantify the cerebral perfusion, namely the cerebral blood flow (CBF) and emerges as a more direct biomarker of neuronal activity than the standard BOLD (Blood Oxygen Level Dependent) fMRI.

Although the study of cerebral vasoreactivity using fMRI is mainly conducted through the BOLD fMRI modality (see the two previous sections), owing to its relatively high signal-to-noise ratio (SNR), ASL fMRI provides a more interpretable measure of cerebral vasoreactivity than BOLD fMRI. Still, ASL suffers from a low SNR and is hampered by a large amount of physiological noise. Our contribution, described in , aims at improving the recovery of the vasoreactive component from the ASL signal. To this end, a Bayesian hierarchical model is proposed, enabling the recovery of perfusion levels as well as fitting their dynamics. On a single-subject ASL real data set involving perfusion changes induced by hypercapnia, the approach is compared with a classical GLM-based analysis. A better goodness-of-fit is achieved, especially in the transitions between baseline and hypercapnia periods. Also, perfusion levels are recovered with higher sensitivity and show a better contrast between gray- and white matter.

**In the context of ARC AINSI project, joint work with:** Philippe Ciuciu
from Neurospin, CEA in Saclay.

The ASL modality is most commonly used as a static measure where the average perfusion is computed over a volume sequence lasting several minutes. Recently, ASL has been used in functional activation protocols and hence gives access to a dynamic measure of perfusion, namely the variations of CBF which are elicited by specific tasks. ASL MRI mainly consists of acquiring pairs of control and label images and looking at the average control-label difference. The Signal-to-Noise Ratio (SNR) of this difference is very low so that several hundreds of image pairs need to be acquired, thus increasing significantly the time spent by the subject in the scanner and making the acquisition very sensitive to the patient's movement. In addition, this averaging requires that the perfusion signal is at a steady state, limiting the scope of fMRI task experiments to baseline perfusion measurements or long block designs. In contrast, it is highly desirable to measure change in perfusion due to an effect of interest in activation paradigms from event-related designs. It is technically possible to collect event-related ASL data but most approaches to functional ASL data analysis use a standard linear model (GLM-based) formulation with regressors encoding differences in control/tag scans and both ASL and BOLD activation signals being associated with the same canonical response function. The canonical hemodynamic response function (HRF) is generally used although it has been been calibrated on BOLD experiments only, thus reflecting simultaneous variations of CBF, cerebral blood volume (CBV) and cerebral oxygen consumption (CMRO2). In contrast, the perfusion signal only reflects variation in CBF so that the associated response, the perfusion response function (PRF), is likely to differ from the HRF. In the internship proposal of Jennifer Sloboda, we proposed to recover both a hemodynamic (BRF for BOLD response function) and a perfusion (PRF) response functions from event-related functional ASL data. To do so, a joint detection estimation (JDE) formalism was used. In the BOLD context, the JDE framework has proven to successfully extract the HRF while also performing activation detection. We had recently extended this formalism (see Section and , ) to model an additional perfusion component linked to the BOLD one through a common activation detection. The main issue addressed then was to characterize the link between BOLD and perfusion components. To establish this link, we proposed a methodological axis which consists of developing a physiologically-inspired approach. To do so, dynamical non-linear equations available in physiological models were linearized and approximated in a parsimonious way so as to establish prior relations between the perfusion and BOLD responses which can be injected in our Bayesian setting. The inference of the initial model is currently done through a Markov Chain Monte Carlo approach but a Variational Expectation-Maximization implementation is also conceivable. As such, the tasks were two-fold: (1) investigate the physiological model and (2) inject it into the JDE setting. Investigation of the physiological model allows for: (1) creation of artificial fMRI data, (2) investigation of the relationship between physiological changes and the resulting simulated BOLD or ASL signal, and (3) characterization of the link between BOLD and perfusion responses. Injection of the physiologically inspired prior into the JDE model, is to (1) improve perfusion response recovery, (2) determine physiologically quantified units to the JDE recovered values This work is going to serve as a preliminary investigation into the incorporation of physiological information in the Bayesian JDE setting from which to determine the trajectory of future model developments.

**This is joint work with:** Eric Coissac and Pierre Taberlet from LECA
(Laboratoire d'Ecologie Alpine) and Alain Viari from Inria team Bamboo.

This work considers a statistical modelling approach to investigate spatial cross-correlations between species in an ecosystem. A special feature is the origin of the data from high-troughput environmental DNA sequencing of soil samples. Here we use data collected at the Nourague CNRS Field Station in French Guiana. We describe bivariate spatial relationships in these data by a separable linear model of coregionalisation and estimate a cross-correlation parameter. Based on this estimate, we visualise plant taxa co-occurrence pattern in form of `interaction graphs' which can be interpreted in terms of ecological interactions. Limitations of this approach are discussed along with possible alternatives in .

**Joint work with:**
Pierre Fernique (Montpellier 2 University, CIRAD
and Inria Virtual Plants), Yann Guédon
(CIRAD and Inria Virtual Plants) and
Iragaël Joly (INRA-GAEL and Grenoble INP).

Multivariate count data are defined as the number of items of different
categories issued from sampling within a population, which individuals
are grouped into categories. The analysis of multivariate count data
is a recurrent and crucial issue in numerous modelling problems,
particularly in the fields of biology and ecology (where the data can
represent, for example, children counts associated with multitype
branching processes), sociology and econometrics. Denoting by

Our context of application was characterised by zero-inflated, often
right skewed marginal distributions. Thus, Gaussian and Poisson
distributions were not *a priori* appropriate. Moreover, the
multivariate histograms typically had many cells, most of which
were empty. Consequently, nonparametric estimation was not efficient.

To achieve these goals, we proposed an approach based on graphical
probabilistic models (Koller & Friedman, 2009 ) to
represent the conditional independence relationships in

Graph search was achieved by a stepwise approach, issued from unification of previous algorithms presented in Koller & Friedman (2009) for DAGs: Hill climbing, greedy search, first ascent and simulated annealing. The search algorithm was improved by taking into account our parametric distribution assumptions, which led to caching overlapping graphs at each step. An adaptation to PDAGs of graph search algorithms for DAGs was developed, by defining new operators specific to PDAGs.

Comparisons between different algorithms in the literature for directed and undirected graphical models was performed on simulated datasets to: (i) Assess gain in speed induced by caching; (ii) Compare the graphs obtained under parametric and nonparametric distributions assumptions; (iii) Compare different strategies for graph initialization. Strategies based on several random graphs were compared to those based on a fast estimation of an undirected graph, assumed to be the moral graph.

First results were obtained in modelling individual daily activity program and interactions between flowering and vegetative growth in plants (see sections below).

**Joint work with:**
Pierre Fernique (Montpellier 2 University and CIRAD) and Yann Guédon
(CIRAD), Inria Virtual Plants.

The quantity and quality of yields in fruit trees is closely related
to processes of growth and branching, which determine ultimately the
regularity of flowering and the position of flowers. Flowering and
fruiting patterns are explained by statistical dependence between
the nature of a parent shoot (*e.g.* flowering or not) and the
quantity and natures of its children shoots – with potential
effect of covariates. Thus, better characterization of patterns and
dependences is expected to lead to strategies to control the
demographic properties of the shoots (through varietal selection or crop
management policies), and thus to bring substantial improvements in
the quantity and quality of yields.

Since the connections between shoots can be represented by mathematical trees, statistical models based on multitype branching processes and Markov trees appear as a natural tool to model the dependencies of interest. Formally, the properties of a vertex are summed up using the notion of vertex state. In such models, the numbers of children in each state given the parent state are modeled through discrete multivariate distributions. Model selection procedures are necessary to specify parsimonious distributions. We developed an approach based on probabilistic graphical models (see Section ) to identify and exploit properties of conditional independence between numbers of children in different states, so as to simplify the specification of their joint distribution , .

This work was carried out in the context of Pierre Fernique's first year of PhD (Montpellier 2 University and CIRAD). It was applied to model dependencies between short or long, vegetative or flowering shoots in apple trees. The results highlighted contrasted patterns related to the parent shoot state, with interpretation in terms of alternation of flowering (see paragraph ). It was also applied to the analysis of the connections between cyclic growth and flowering of mango trees . This work will be continued during Pierre Fernique's PhD thesis, with extensions to other fruit tree species and other parametric discrete multivariate families of distributions, including covariates and mixed effects.

**Joint work with:**
Jean Peyhardi and Yann Guédon (Mixed Research Unit DAP, Virtual Plants
team), Baptiste Guitton, Yan Holtz and Evelyne Costes (DAP, AFEF
team), Catherine Trottier (Montpellier University)

A first study was performed to characterize genetic determinisms of the alternation of flowering in apple tree progenies , . Data were collected at two scales: at whole tree scale (with annual time step) and a local scale (annual shoot or AS, which is the portions of stem that were grown during the same year). Two replications of each genotype were available.

Indices were proposed to characterize alternation at tree scale. The difficulty is related to early detection of alternating genotypes, in a context where alternation is often concealed by a substantial increase of the number of flowers over consecutive years. To separate correctly the increase of the number of flowers due to aging of young trees from alternation in flowering, our model relied on a parametric hypothesis for the trend (fixed slopes specific to genotype and random slopes specific to replications), which translated into mixed effect modelling. Then, different indices of alternation were computed on the residuals. Clusters of individuals with contrasted patterns of bearing habits were identified.

To model alternation of flowering at AS scale, a second-order Markov tree model was built. Its transition probabilities were modelled as generalized linear mixed models, to incorporate the effects of genotypes, year and memory of flowering for the Markovian part, with interactions between these components.

Asynchronism of flowering at AS scale was assessed using an entropy-based criterion. The entropy allowed for a characterisation of the roles of local alternation and asynchronism in regularity of flowering at tree scale.

Moreover, our models highlighted significant correlations between indices of alternation at AS and individual scales.

This work was extended by the Master 2 internship of Yan Holtz, supervised by Evelyne Costes and Jean-Baptiste Durand. New progenies were considered, and a methodology based on a lighter measurement protocol was developed and assessed. It consisted in assessing the accuracy of approximating the indices computed from measurements at tree scale by the same indices computed as AS scale. The approximations were shown sufficiently accurate to provide an operational strategy for apple tree selection.

As a perspective of this work, patterns in the production of children ASs (numbers of flowering and vegetative children) depending on the type of the parent AS must be analyzed using branching processes and different types of Markov trees, in the context of Pierre Fernique's PhD Thesis (see paragraph ).

**Joint work with:** L. Gardes (Univ. Strasbourg) and E. Deme (Univ. Gaston Berger, Sénégal)

We are working on the estimation of the second
order parameter

In addition to this work, we have established a review on the Weibull-tail distributions .

**Joint work with:** L. Gardes (Univ. Strasbourg) and A. Daouia
(Univ. Toulouse I and Univ. Catholique de Louvain)

The goal of the PhD thesis of Alexandre Lekina was to contribute to
the development of theoretical and algorithmic models to tackle
conditional extreme value analysis, *ie* the situation where
some covariate information

Conditional extremes are studied in climatology where one is interested in how climate change over years might affect extreme temperatures or rainfalls. In this case, the covariate is univariate (time). Bivariate examples include the study of extreme rainfalls as a function of the geographical location. The application part of the study is joint work with the LTHE (Laboratoire d'étude des Transferts en Hydrologie et Environnement) located in Grenoble.

**Joint work with:** L. Gardes and A. Guillou (Univ. Strasbourg)

One of the most popular risk measures is the Value-at-Risk (VaR) introduced in the 1990's.
In statistical terms,
the VaR at level *i.e.* when

**Joint work with:** C. Amblard (TimB in TIMC laboratory, Univ. Grenoble I) and L. Menneteau (Univ. Montpellier II)

Copulas are a useful tool to model multivariate distributions . At first, we developed an extension of some particular copulas . It followed a new class of bivariate copulas defined on matrices and some analogies have been shown between matrix and copula properties.

However, while there exist various families of bivariate copulas, much fewer has been done when the dimension is higher. To this aim an interesting class of copulas based on products of transformed copulas has been proposed in the literature. The use of this class for practical high dimensional problems remains challenging. Constraints on the parameters and the product form render inference, and in particular the likelihood computation, difficult. We proposed a new class of high dimensional copulas based on a product of transformed bivariate copulas . No constraints on the parameters refrain the applicability of the proposed class which is well suited for applications in high dimension. Furthermore the analytic forms of the copulas within this class allow to associate a natural graphical structure which helps to visualize the dependencies and to compute the likelihood efficiently even in high dimension. The extreme properties of the copulas are also derived and an R package has been developed.

As an alternative, we also proposed a new class of copulas constructed by introducing a latent factor. Conditional independence with respect to this factor and the use of a nonparametric class of bivariate copulas lead to interesting properties like explicitness, flexibility and parsimony. In particular, various tail behaviours are exhibited, making possible the modeling of various extreme situations. A pairwise moment-based inference procedure has also been proposed and the asymptotic normality of the corresponding estimator has been established .

**Joint work with:** A. Guillou and L. Gardes (Univ. Strasbourg), G. Stupfler (Univ. Aix-Marseille)
and A. Daouia (Univ. Toulouse I and Univ. Catholique de Louvain)

The boundary bounding the set of points is viewed as the larger level set of the points distribution. This is then an extreme quantile curve estimation problem. We proposed estimators based on projection as well as on kernel regression methods applied on the extreme values set, for particular set of points . We also investigate the asymptotic properties of existing estimators when used in extreme situations. For instance, we have established in collaboration with G. Stupfler that the so-called geometric quantiles have very counter-intuitive properties in such situations and thus should not be used to detect outliers.

In collaboration with A. Daouia, we investigate the application of such methods in econometrics : A new characterization of partial boundaries of a free disposal multivariate support is introduced by making use of large quantiles of a simple transformation of the underlying multivariate distribution. Pointwise empirical and smoothed estimators of the full and partial support curves are built as extreme sample and smoothed quantiles. The extreme-value theory holds then automatically for the empirical frontiers and we show that some fundamental properties of extreme order statistics carry over to Nadaraya's estimates of upper quantile-based frontiers.

In collaboration with G. Stupfler and A. Guillou, new estimators of the boundary are introduced. The regression is performed on the whole set of points, the selection of the “highest” points being automatically performed by the introduction of high order moments , .

**Joint work with:** S. Douté from Laboratoire de
Planétologie de Grenoble, J. Chanussot (Gipsa-lab and Grenoble-INP) and J. Saracco (Univ. Bordeaux).

Visible and near infrared imaging spectroscopy is
one of the key techniques
to detect, to map and to characterize mineral and volatile (eg.
water-ice)
species existing at
the surface of planets. Indeed the chemical composition,
granularity, texture, physical state, etc. of the materials
determine the existence and morphology of the absorption bands.
The resulting spectra contain therefore very useful information.
Current imaging spectrometers provide data organized as three
dimensional hyperspectral images: two spatial dimensions and one
spectral dimension. Our goal is to estimate the functional
relationship

**Joint work with**: Zaid Harchaoui from LEAR team Inria Grenoble

The change-point problem is a classical problem of statistics that arises in various applications as signal processing, bioinformatics, financial market analysis. The goal of change-point problems is to make an inference about the moment of a change in the distribution of the observed data. We consider the problem of detection of a simultaneous change in mean in a sequence of Gaussian vectors.

The state-of-the-art approach to the change-point detection/estimation is based on the assumption of growing number of observations and fixed dimension of the signal. We work in high-dimensional setting assuming that the vector dimension tends to infinity and the length of the sequence grows slower than the dimension of the signal. Assuming that the change occurs only in a subset of the vector components of unknown cardinality we can reduce our problem to the problem of testing non-zero components in a sequence of sparse Gaussian vectors. We construct a testing procedure that is adaptive to the number of components with a change. This testing procedure is based on combination of two chi-squared type test statistics. This combined test provides an optimal performance of the test both in the cases of high and moderate sparsity. We obtain the detection boundary of the test and show its rate-optimality in minimax sense.

The results of the paper were presented at

NIPS 2013, Workshop on Modern Nonparametric Methods in Machine Learning (Dec. 2013)

Conference on Structural Inference in Statistics, Potsdam, Germany (Sept. 2013)

**Joint work with**: Dominique Morche (CEA-LETI) and Alp Oguz (CEA-LETI)

The demand for high data rate in communication puts stringent requirements on components' dynamic range. However, the extreme size reduction in advanced technology results inadvertently in increased process variability, which inherently limits the performances. The redundancy approach is based on the idea of dividing an elementary component (capacitor, resistor, transistor) into several subsets and then choosing an optimal combination of such subsets to provide the production of a component with very precise characteristics. For several years, the redundancy method has been identified as complementary to digital calibration to improve the performances. On practice, it is hard for a designer to select an optimal number of redundant components to provide the desired production yield and to minimize the area occupied by the components. The usual way to solve this problem is to resort to statistical simulations which are time consuming and sometimes misleading. We propose a normal approximation of the yield in order to estimate the number of redundant components needed to provide a minimal area occupied by the components.

mistis is involved in three regional initiatives: PEPS (funded by CNRS and the PRES of Grenoble), AGIR (funded by Université Grenoble 1 and Grenoble-INP) and the MOTU project (funded by UPMF). The first two projects focus on the modelling of the extreme risk and its application in social science. The partners include the LTHE (Laboratoire d'étude des Transferts en Hydrologie et Environnement) and the 3S-R lab (Sols, Solides, Structures - Risques). The third project focuses on the use of statistical techniques for transportation data analysis and involves the GAEL laboratory (Grenoble Applied Economics Laboratory).

mistis participates in the weekly statistical seminar of Grenoble. Jean-Baptiste Durand is in charge of the organization and several lecturers have been invited in this context.

S. Girard is at the head of the probability and statistics department of the LJK since september 2012.

mistis was a partner in a three-year MINALOGIC
project (I-VP for Intuitive Vision Programming) supported by the
French Government. The project was led by VI Technology
(http://

The 2-year Inria ARC project AINSI (2011-12) coordinated by F. Forbes
(http://

Title: Humanoids with audiovisual skills in populated spaces

Type: COOPERATION (ICT)

Defi: Cognitive Systems and Robotics

Instrument: Specific Targeted Research Project (STREP)

Duration: February 2010 - January 2013

Coordinator: Inria (France)

Others partners: CTU Prague (Czech Republic), University of Bielefeld (Germany), IDIAP (Switzerland), Aldebaran Robotics (France)

See also: http://

Abstract: Humanoids expected to collaborate with people should be able to interact with them in the most natural way. This involves significant perceptual and interactive skills, operating in a coordinated fashion. Consider a social gathering scenario where a humanoid is expected to possess certain social skills. It should be able to analyze a populated space, to localize people, and to determine whether they are looking at the robot and are speaking to it. Humans appear to solve these tasks routinely by integrating the often complementary information provided by multi-sensory data processing, from 3D object positioning and sound-source localization to gesture recognition. Understanding the world from unrestricted sensorial data, recognizing people?s intentions and behaving like them are extremely challenging problems. The objective of HUMAVIPS has been to endow humanoid robots with audiovisual (AV) abilities: exploration, recognition, and interaction, such that they exhibit adequate behavior when dealing with a group of people. Developed research and technological developments have emphasized the role played by multimodal perception within principled models of human-robot interaction and of humanoid behavior. An adequate architecture has implemented auditory and visual skills onto a fully programmable humanoid robot (the consumer robot NAO). A free and open-source software platform has been developed to foster dissemination and to ensure exploitation of the outcomes of HUMAVIPS beyond its lifetime.

The main international collaborations that we are currently trying to develop are with:

Emma Holian and John Hinde from National University of Ireland, Galway, Ireland.

K. Qin and D. Wraith from RMIT and Centre for Epidemiology and Biostatistics University in Melbourne, Australia.

E. Deme and S. Sylla from Saint Louis university and IRD in Saint Louis, Senegal.

Alexandre Nazin and Russian Academy of Science in Moscow, Russia.

Alexis Roche and University Hospital Lausanne/Siemens Healthcare, Advanced Clinical Imaging Technology group, Lausanne, Switzerland.

Alexander Nazin (Russian Academy of Sciences, Russia) has been an invited researcher in the mistis team to work with Stéphane Girard and Anatoli Ioudistki (Université Grenoble 1).

El Hadji Deme (Université Gaston Berger, Sénégal) has been hosted by the mistis team for two months. His stay has been partially funded by the Ibni Oumar Mahamat Saleh price.

Jennifer Sloboda (Master, from May 2013 until Aug 2013)

Subject: Physiologically-inspired Bayesian analysis of BOLD and ASL fMRI data

Institution: University of Michigan, Ann Arbor (United States)

Aina Frau-Pascual (Master, from May 2013 until Aug 2013)

Subject: Hemodynamically informed parcellation of cerebral fMRI data

Institution: University Paris Sud

Pham Van Trung ( Master, from Apr 2013 until Sep 2013)

Subject: Implémentation et paquetage d'un modèle statistique des valeurs extrêmes.

Institution: Hanoi, Vietnam.

Seydou-Nourou Sylla (PhD, from October 2013 to December 2013)

Subject: Classification for medical data

Institution: Université Gaston Berger (Sénégal)

**Editorial activities**

Stéphane Girard is Associate Editor of the *Statistics and Computing* journal since 2012.
He has been also an invited editor for a special issue of the *Journal de la Société Francaise de Statistique* dedicated to extreme-value analysis.

**Workshops and summer schools**

Florence Forbes and Stéphane Girard co-organized the summer school
“*Méthodes et applications de la régression en astrophysique*”, Annecy,
http://

Florence Forbes and Stéphane Girard co-organized the workshop
“*Gémétrie Aléatoire et ses Applications*”, Grenoble,
http://

Stéphane Girard organized the workshop
“*Copulas and extremes*”, Grenoble,
http://

Marie José Martinez, Jean Baptiste Durand, Florence Forbes in collaboration with Iragael Joly (Grenoble Applied Economics Laboratory) organized the workshop "Statistics, Activities and Transportation" in Grenoble
http://

**Societies and Networks**

F. Forbes is part of an INRA
(French National Institute for Agricultural Research)
Network (AIGM, http://

F. Forbes and S. Girard were elected as members of the bureau of the “Analyse d'images, quantification, et statistique” group in the Société Française de Statistique (SFdS).

Licence (IUT): Marie-José Martinez , *Statistics*, 192 ETD, L1 to L3 levels, université Grenoble 2, France.

Master: Jean-Baptiste Durand, *Statistics and probabilty*, 192 ETD, M1 and M2 levels, Ensimag Grenoble INP, France.

Licence (IUT) : Gildas Mazo, Mathematics and C language, 128h, L1 level, université Grenoble 1, France.

Master: Farida Enikeeva, *Statistics*, 96 ETD, M1 level, Ensimag Grenoble INP, France.

Licence: Christine Bakhous, *Mathematics and Statistics*, 64 ETD, L1 level, université Grenoble 1, France.

Licence: Jonathan El Methni, *Mathematics and Statistics*, 64 ETD, L1 level, université Grenoble 1, France.

Master : Stéphane Girard, *Statistique Inférentielle Avancée*, 45 ETD, M1 level,
Ensimag Grenoble-INP, France.

Master : Florence Forbes, Mixture models and EM algorithm, 12h, M2 level, UFR IM2A, université Grenoble 1, France.

M.-J. Martinez is faculty members at Univ. Pierre Mendès France, Grenoble II.

J.-B. Durand is a faculty member at Ensimag, Grenoble INP.

F. Enikeeva is on a half-time ATER position at Ensimag, Grenoble INP.

C. Bakhous and J. El Methni were both moniteur at University Joseph Fourier.

PhD: Christine Bakhous, "*Modèles d'encodage parcimonieux de l'activité cérébrale mesurée par IRM fonctionnelle*", Université Grenoble 1, defended on December 2013.
Supervision : Florence Forbes & Michel Dojat (GIN).

PhD : Jonathan El-methni,
“*Différentes contributions à l'estimation de quantiles extrêmes*”
Université Grenoble 1, defended on october 2013.
Supervision : Stéphane Girard & Laurent Gardes (Université de Strasbourg).

PhD : El-hadji Deme, “*Quelques contributions à la théorie univariée des
valeurs extrêmes. Estimation des mesures de risque actuariel pour des pertes à queues lourdes*”,
Université Gaston Berger, Sénégal, defended on june 2013. Supervision : Stéphane Girard & Gane Samb Lo (Université Gaston Berger, Sénégal)

PhD in progress: Aina Frau-Pascual, "*Statistical Models for the coupling of ASL and BOLD Magnetic Resonance modalities to study brain function and disease*", Université Grenoble 1, started in october 2013. Supervision : Florence Forbes & Philippe Ciuciu (Parietal, NeuroSpin).

PhD in progress : Alessandro Chiancone “*Sequential dimension reduction*”, Université Grenoble 1, started in october 2013. Supervision : Stéphane Girard & Jocelyn Chanussot (Gypsa-lab, Grenoble INP).

PhD in progress : Seydou Nourou Sylla
“*Modélisation statistique pour l'analyse des causes de décès décrites
par autopsie verbale en milieu rural africain : cas du Sénégal*”, Université Gaston Berger, Sénégal,
started in october 2012.
Supervision : Stéphane Girard & Abdou Ka Diongue (Université Gaston Berger, Sénégal).

PhD in progress : Gildas Mazo,
“*Estimation de quantiles extrêmes spatiaux à partir de données environnementales*”, Université Grenoble 1, started in october 2011.
Supervision : Florence Forbes & Stéphane Girard.

Stéphane Girard has been involved in the following PhD commitees:

Yousri Henchiri,
“*Support Vector Machine (SVM) pour l'analyse de données fonctionnelles*”, Université Montpellier 2.

François Portier
“*Réduction de la dimension en régression*”, Université Rennes 1.

Smriti Joshi,
“*Consommation statique dans les circuits numériques en CMOS 32nm: Analyse et méthodologie pour une estimation statistique au niveau porte*”, Université Grenoble.

Florence Forbes has been involved in the PhD committees of:

Xavier Alameda-Pineda, Egocentric audio-visual scene analysis: a machine learning and signal processions approach, University Grenoble 1.

Antoine Deleforge, Acoustic Space Mapping: a machine learning approach to sound source separation and localization, University Grenoble 1

Mohamad Belouni, Plans d'expérience optimaux en régression appliquée à la pharmacocinétique, University Grenoble 1.

Solveig Badillo, Etude de la variabilité hémodynamique chez l'enfant et l'adulte sains en IRMf, University Paris Sud

Virgile Fritsch, High-dimensional statistical methods for inter-subjects studies in neuroimaging, University Paris Sud

Since September 2009, F. Forbes is head of the committee in charge of examining post-doctoral candidates at Inria Grenoble Rhône-Alpes ("Comité des Emplois Scientifiques").

Florence Forbes is a member of the INRA committee (CSS MBIA) in charge of evaluating INRA researchers once a year in the MBIA dept of INRA.

Florence Forbes was a member of:

the AERES committee in charge of evaluating the AgroParisTech unit.

the committee for selecting a new professor at University Grenoble 1.

the LJK committee for attributing the first Jean Kuntzman award.