The Context of our work is the analysis of structured stochastic models with statistical tools. The idea underlying the concept of structure is that stochastic systems that exhibit great complexity can be accounted for by combining simple local assumptions in a coherent way. This provides a key to modelling, computation, inference and interpretation. This approach appears to be useful in a number of high impact applications including signal and image processing, neuroscience, genomics, sensors networks, etc. while the needs from these domains can in turn generate interesting theoretical developments. However, these powerful and flexible approach can still be restricted by necessary simplifying assumptions and several generic sources of complexity in data.

Often data exhibit complex dependence structures, having to do for example with repeated measurements on individual items, or natural grouping of individual observations due to the method of sampling, spatial or temporal association, family relationship, and so on. Other sources of complexity are connected with the measurement process, such as having multiple measuring instruments or simulations generating high dimensional and heterogeneous data or such that data are dropped out or missing. Such complications in data-generating processes raise a number of challenges. Our goal is to contribute to statistical modelling by offering theoretical concepts and computational tools to handle properly some of these issues that are frequent in modern data. So doing, we aim at developing innovative techniques for high scientific, societal, economic impact applications and in particular via image processing and spatial data analysis in environment, biology and medicine.

The methods we focus on involve mixture models, Markov models, and more generally hidden structure models identified by stochastic algorithms on one hand, and semi and non-parametric methods on the other hand.

Hidden structure models are useful for taking into account heterogeneity in data. They concern many areas of statistics (finite mixture analysis, hidden Markov models, graphical models, random effect models, ...). Due to their missing data structure, they induce specific difficulties for both estimating the model parameters and assessing performance. The team focuses on research regarding both aspects. We design specific algorithms for estimating the parameters of missing structure models and we propose and study specific criteria for choosing the most relevant missing structure models in several contexts.

Semi and non-parametric methods are relevant and useful when no
appropriate parametric model exists for the data under study
either because of data complexity, or because information is
missing.
When observations are curves, they enable us to model the
data without a discretization step. These
techniques are also of great use for *dimension reduction* purposes. They enable dimension reduction of the
functional or multivariate data with no assumptions on the
observations distribution. Semi-parametric methods refer to
methods that include both parametric and non-parametric aspects.
Examples include the Sliced Inverse Regression (SIR) method which combines non-parametric regression techniques
with parametric dimension reduction aspects. This is also the case
in *extreme value analysis*, which is based
on the modelling of distribution tails
by both a functional part and a real parameter.

**Key-words:**
mixture of distributions, EM algorithm, missing data, conditional independence,
statistical pattern recognition, clustering,
unsupervised and partially supervised learning.

In a first approach, we consider statistical parametric models,

These models are interesting in that they may point out hidden
variable responsible for most of the observed variability and so
that the observed variables are *conditionally* independent.
Their estimation is often difficult due to the missing data. The
Expectation-Maximization (EM) algorithm is a general and now
standard approach to maximization of the likelihood in missing
data problems. It provides parameter estimation but also values
for missing data.

Mixture models correspond to independent

**Key-words:**
graphical models, Markov properties, hidden Markov models, clustering, missing data, mixture of distributions, EM algorithm, image analysis, Bayesian
inference.

Graphical modelling provides a diagrammatic representation of the dependency structure of a joint probability distribution, in the form of a network or graph depicting the local relations among variables. The graph can have directed or undirected links or edges between the nodes, which represent the individual variables. Associated with the graph are various Markov properties that specify how the graph encodes conditional independence assumptions.

It is the conditional independence assumptions that give graphical models their fundamental modular structure, enabling computation of globally interesting quantities from local specifications. In this way graphical models form an essential basis for our methodologies based on structures.

The graphs can be either
directed, e.g. Bayesian Networks, or undirected, e.g. Markov Random Fields.
The specificity of Markovian models is that the dependencies
between the nodes are limited to the nearest neighbor nodes. The
neighborhood definition can vary and be adapted to the problem of
interest. When parts of the variables (nodes) are not observed or missing,
we
refer to these models as Hidden Markov Models (HMM).
Hidden Markov chains or hidden Markov fields correspond to cases where the

Hidden Markov models are very useful in modelling spatial dependencies but these dependencies and the possible existence of hidden variables are also responsible for a typically large amount of computation. It follows that the statistical analysis may not be straightforward. Typical issues are related to the neighborhood structure to be chosen when not dictated by the context and the possible high dimensionality of the observations. This also requires a good understanding of the role of each parameter and methods to tune them depending on the goal in mind. Regarding estimation algorithms, they correspond to an energy minimization problem which is NP-hard and usually performed through approximation. We focus on a certain type of methods based on variational approximations and propose effective algorithms which show good performance in practice and for which we also study theoretical properties. We also propose some tools for model selection. Eventually we investigate ways to extend the standard Hidden Markov Field model to increase its modelling power.

**Key-words:** dimension reduction, extreme value analysis, functional estimation.

We also consider methods which do not assume a parametric model.
The approaches are non-parametric in the sense that they do not
require the assumption of a prior model on the unknown quantities.
This property is important since, for image applications for
instance, it is very difficult to introduce sufficiently general
parametric models because of the wide variety of image contents.
Projection methods are then a way to decompose the unknown
quantity on a set of functions (*e.g.* wavelets). Kernel
methods which rely on smoothing the data using a set of kernels
(usually probability distributions) are other examples.
Relationships exist between these methods and learning techniques
using Support Vector Machine (SVM) as this appears in the context
of *level-sets estimation* (see section ). Such
non-parametric methods have become the cornerstone when dealing
with functional data . This is the case, for
instance, when observations are curves. They enable us to model the
data without a discretization step. More generally, these
techniques are of great use for *dimension reduction* purposes
(section ). They enable reduction of the dimension of the
functional or multivariate data without assumptions on the
observations distribution. Semi-parametric methods refer to
methods that include both parametric and non-parametric aspects.
Examples include the Sliced Inverse Regression (SIR) method
which combines non-parametric regression
techniques
with parametric dimension reduction aspects. This is also the case
in *extreme value analysis* , which is based
on the modelling of distribution tails (see section ).
It differs from traditional statistics which focuses on the central
part of distributions, *i.e.* on the most probable events.
Extreme value theory shows that distribution tails can be
modelled by both a functional part and a real parameter, the
extreme value index.

Extreme value theory is a branch of statistics dealing with the extreme
deviations from the bulk of probability distributions.
More specifically, it focuses on the limiting distributions for the
minimum or the maximum of a large collection of random observations
from the same arbitrary distribution.
Let *i.e.*

To estimate such quantiles therefore requires dedicated
methods to
extrapolate information beyond the observed values of

where both the extreme-value index *i.e.* such that

for all

More generally, the problems that we address are part of the risk management theory. For instance, in reliability, the distributions of interest are included in a semi-parametric family whose tails are decreasing exponentially fast. These so-called Weibull-tail distributions are defined by their survival distribution function:

Gaussian, gamma, exponential and Weibull distributions, among others,
are included in this family. An important part of our work consists
in establishing links between models () and ()
in order to propose new estimation methods.
We also consider the case where the observations were recorded with a covariate information. In this case, the
extreme-value index and the

Level sets estimation is a
recurrent problem in statistics which is linked to outlier
detection. In biology, one is interested in estimating reference
curves, that is to say curves which bound

Our work on high dimensional data requires that we face the curse of dimensionality phenomenon. Indeed, the modelling of high dimensional data requires complex models and thus the estimation of high number of parameters compared to the sample size. In this framework, dimension reduction methods aim at replacing the original variables by a small number of linear combinations with as small as a possible loss of information. Principal Component Analysis (PCA) is the most widely used method to reduce dimension in data. However, standard linear PCA can be quite inefficient on image data where even simple image distorsions can lead to highly non-linear data. Two directions are investigated. First, non-linear PCAs can be proposed, leading to semi-parametric dimension reduction methods . Another field of investigation is to take into account the application goal in the dimension reduction step. One of our approaches is therefore to develop new Gaussian models of high dimensional data for parametric inference . Such models can then be used in a Mixtures or Markov framework for classification purposes. Another approach consists in combining dimension reduction, regularization techniques, and regression techniques to improve the Sliced Inverse Regression method .

As regards applications, several areas of image analysis can be covered using the tools developed in the team. More specifically, in collaboration with team Perception, we address various issues in computer vision involving Bayesian modelling and probabilistic clustering techniques. Other applications in medical imaging are natural. We work more specifically on MRI data, in collaboration with the Grenoble Institute of Neuroscience (GIN) and the NeuroSpin center of CEA Saclay. We also consider other statistical 2D fields coming from other domains such as remote sensing, in collaboration with Laboratoire de Planétologie de Grenoble. We worked on hyperspectral images. In the context of the "pole de competivite" project I-VP, we worked of images of PC Boards.

A second domain of applications concerns biology and medicine. We consider the use of missing data models in epidemiology. We also investigated statistical tools for the analysis of bacterial genomes beyond gene detection. Applications in neurosiences are also considered. Finally, in the context of the ANR VMC project Medup, we studied the uncertainties on the forecasting and climate projection for Mediterranean high-impact weather events.

Creation of the Pixyl startup (http://

Xerox Foundation University Affairs Committee (UAC) collaborative grant. F. Forbes was co-laureate (with R. Horaud) of this grant (90 k$) for a three year project (2014-2017) on Advanced and Scalable Graph Signal Processing Techniques. It was awarded in collaboration with Arijit Biswas and Anirban Mondal, research scientists at Xerox Research Center India (XRCI), Bangalore.

Mixtures of Multiple Scaled Student T distributions

Keywords: Health - Statistics - Brain MRI - Medical imaging - Robust clustering

Functional Description

The package implements mixtures of so-called multiple scaled Student distributions, which are generalisation of multivariate Student T distribution allowing different tails in each dimension. Typical applications include Robust clustering to analyse data with possible outliers. In this context, the model and package have been used on large data sets of brain MRI to segment and identify brain tumors.

Participants: Alexis Arnaud, Florence Forbes and Darren Wraith

Contact: Florence Forbes

Keywords: Health - Neuroimaging - Cancer - Brain MRI - Medical imaging

Functional Description

The Locus software was extended to address the delineation of lesions in pathological brains. Its extension P-LOCUS software analyses, in few minutes, a 3D MR brain scan and performs fully automatic brain lesion delineation using a combined dataset of various 3D MRI sequences.

Participants: Senan Doyle, Florence Forbes, Michel Dojat and Pascal Rubini

Partner: INSERM

Contact: Florence Forbes

URL: http://

Keywords: FMRI - Statistic analysis - Neurosciences - IRM - Brain - Health - Medical imaging

Functional Description

As part of fMRI data analysis, PyHRF provides a set of tools for addressing the two main issues involved in intra-subject fMRI data analysis : (i) the localization of cerebral regions that elicit evoked activity and (ii) the estimation of the activation dynamics also referenced to as the recovery of the Hemodynamic Response Function (HRF). To tackle these two problems, PyHRF implements the Joint Detection-Estimation framework (JDE) which recovers parcel-level HRFs and embeds an adaptive spatio-temporal regularization scheme of activation maps.

Participants: Thomas Vincent, Solveig Badillo, Lotfi Chaari, Christine Bakhous, Florence Forbes, Philippe Ciuciu, Laurent Risser, Thomas Perret and Aina Frau Pascual

Partners: CEA - NeuroSpin

Contact: Florence Forbes

URL: http://

**Joint work with:** C. Bouveyron (Univ. Paris 5), M. Fauvel (ENSAT Toulouse)
and J. Chanussot (Gipsa-lab and Grenoble-INP)

In the PhD work of Charles Bouveyron (co-advised by Cordelia Schmid from the Inria LEAR team) , we proposed new Gaussian models of high dimensional data for classification purposes. We assume that the data live in several groups located in subspaces of lower dimensions. Two different strategies arise:

the introduction in the model of a dimension reduction constraint for each group

the use of parsimonious models obtained by imposing to different groups to share the same values of some parameters

This modelling yields a supervised classification method called High Dimensional Discriminant Analysis (HDDA) . Some versions of this method have been tested on the supervised classification of objects in images. This approach has been adapted to the unsupervised classification framework, and the related method is named High Dimensional Data Clustering (HDDC) . Our recent work consists in adding a kernel in the previous methods to deal with nonlinear data classification and heterogeneous data . We first investigate the use of kernels derived from similary measures on binary data . The targeted application is the analysis of verbal autopsy data (PhD thesis of N. Sylla): Indeed, health monitoring and evaluation make more and more use of data on causes of death from verbal autopsies in countries which do not keep records of civil status or with incomplete records. The application of verbal autopsy method allows to discover probable cause of death. Verbal autopsy has become the main source of information on causes of death in these populations. Second, the kernel classification method is applied to three real hyperspectral data sets,and compared with three others classifiers. The proposed models show good results in terms of classification accuracy and processing time .

**Joint work with:** L. Gardes (Univ. Strasbourg), G. Mazo
(Univ. Catholique de Louvain), J. Elmethni (Univ. Paris 5) and S. Louhichi (Univ. Grenoble 1)

The goal of the PhD theses of Alexandre Lekina and Jonathan El Methni was to contribute to
the development of theoretical and algorithmic models to tackle
conditional extreme value analysis, *ie* the situation where
some covariate information

Conditional extremes are studied in climatology where one is interested in how climate change over years might affect extreme temperatures or rainfalls. In this case, the covariate is univariate (time). Bivariate examples include the study of extreme rainfalls as a function of the geographical location. The application part of the study is joint work with the LTHE (Laboratoire d'étude des Transferts en Hydrologie et Environnement) located in Grenoble and the “département Génie urbain” of “Université Paris-Est Marne-la-vallée” .

**Joint work with:** A. Daouia (Univ. Toulouse), E. Deme (Univ. Gaston-Berger, Sénégal), A. Guillou
(Univ. Strasbourg) and G. Stupfler (Univ. Aix-Marseille).

One of the most popular risk measures is the Value-at-Risk (VaR) introduced in the 1990's.
In statistical terms,
the VaR at level *i.e.* when

**Joint work with:** F. Durante (Univ. Bolzen-Bolzano, Italy) L. Gardes (Univ. Strasbourg) and G. Mazo (Univ.
Catholique de Louvain, Belgique).

Copulas are a useful tool to model multivariate distributions .

However, while there exist various families of bivariate copulas, much fewer has been done when the dimension is higher. To this aim an interesting class of copulas based on products of transformed copulas has been proposed in the literature. The use of this class for practical high dimensional problems remains challenging. Constraints on the parameters and the product form render inference, and in particular the likelihood computation, difficult. We proposed a new class of high dimensional copulas based on a product of transformed bivariate copulas . No constraints on the parameters refrain the applicability of the proposed class which is well suited for applications in high dimension. Furthermore the analytic forms of the copulas within this class allow to associate a natural graphical structure which helps to visualize the dependencies and to compute the likelihood efficiently even in high dimension. The extreme properties of the copulas are also derived and an R package has been developed.

As an alternative, we also proposed a new class of copulas constructed by introducing a latent factor. Conditional independence with respect to this factor and the use of a nonparametric class of bivariate copulas lead to interesting properties like explicitness, flexibility and parsimony. In particular, various tail behaviours are exhibited, making possible the modeling of various extreme situations , , . A pairwise moment-based inference procedure has also been proposed and the asymptotic normality of the corresponding estimator has been established .

In collaboration with L. Gardes, we investigate the estimation of the tail copula, which is widely used to describe the amount of extremal dependence of a multivariate distribution. In some situations such as risk management, the dependence structure can be linked with some covariate. The tail copula thus depends on this covariate and is referred to as the conditional tail copula. The aim of our work is to propose a nonparametric estimator of the conditional tail copula and to establish its asymptotic normality .

**Joint work with:** G. Stupfler (Univ. Aix-Marseille)

The boundary bounding the set of points is viewed as the larger level set of the points distribution. This is then an extreme quantile curve estimation problem. We proposed estimators based on projection as well as on kernel regression methods applied on the extreme values set, for particular set of points . We also investigate the asymptotic properties of existing estimators when used in extreme situations. For instance, we have established in collaboration with G. Stupfler that the so-called geometric quantiles have very counter-intuitive properties in such situations , and thus should not be used to detect outliers.

**Joint work with:** J. Chanussot (Gipsa-lab and Grenoble-INP).

Visible and near infrared imaging spectroscopy is
one of the key techniques
to detect, to map and to characterize mineral and volatile (eg.
water-ice)
species existing at
the surface of planets. Indeed the chemical composition,
granularity, texture, physical state, etc. of the materials
determine the existence and morphology of the absorption bands.
The resulting spectra contain therefore very useful information.
Current imaging spectrometers provide data organized as three
dimensional hyperspectral images: two spatial dimensions and one
spectral dimension. Our goal is to estimate the functional
relationship

In his PhD thesis work, Alessandro Chiancone studies the extension of the SIR method to different sub-populations. The idea is to assume that the dimension reduction subspace may not be the same for different clusters of the data .

Sliced Inverse Regression (SIR) has been extensively used to reduce the dimension of the predictor space before performing regression. Recently it has been shown that this techniques is, not surprisingly, sensitive to noise. Different approaches has been proposed to robustify SIR, in this work, we start considering an inverse problem proposed by R.D. Cook and we show that the framework can be extended to take into account a non-Gaussian noise. Generalized Student distribution are considered and all parameters are estimated via EM algorithm. The algorithm is outlined and tested comparing the results with different approaches on simulated data. Results on a real dataset shows the interest of this technique in presence of outliers.

**Joint work with:**
Pierre Fernique (Inria) and Yann Guédon
(CIRAD), Inria Virtual Plants.

In the context of plant growth modelling, methods to identify subtrees of a tree or forest with similar attributes have been developed. They rely either on hidden Markov modelling or multiple change-point approaches. The latter are well-developed in the context of sequence analysis, but their extensions to tree-structured data are not straightforward. Their advantage on hidden Markov models is to relax the strong constraints regarding dependencies induced by parametric distributions and local parent-children dependencies. Heuristic approaches for change-point detection in trees were proposed and applied to the analysis of patchiness patterns (consisting of canopies made of clumps of either vegetative or flowering botanical units) in mango trees .

This research theme is supported by a LabEx PERSYVAL-Lab project-team grant.

**Joint work with:**
Marianne Clausel (LJK)
Anne Guérin-Dugué (GIPSA-lab)
and Benoit Lemaire (Laboratoire de Psychologie et Neurocognition)

In the last years, GIPSA-lab has developed computational models of information search in web-like materials,
using data from both eye-tracking and electroencephalograms (EEGs). These data were obtained from experiments,
in which subjects had to make some kinds of press reviews. In such tasks, reading process and decision making
are closely related. Statistical analysis of such data aims at deciphering underlying dependency structures
in these processes. Hidden Markov models (HMMs) have been used on eye movement series to infer phases
in the reading process that can be interpreted as steps in the cognitive processes leading to decision.
In HMMs, each phase is associated with a state of the Markov chain. The states are observed indirectly
through eye-movements. Our approach was inspired by Simola *et al.* (2008) ,
but we used hidden semi-Markov models for better characterization of phase length distributions.
The estimated HMM highlighted contrasted reading strategies (i.e., state transitions), with both
individual and document-related variability.

However, the characteristics of eye movements within each phase tended to be poorly discriminated. As a result, high uncertainty in the phase changes arose, and it could be difficult to relate phases to known patterns in EEGs.

This is why, as part of Brice Olivier's PhD thesis, we are developing integrated models coupling EEG and eye movements within one single HMM for better identification of the phases. Here, the coupling should incorporate some delay between the transitions in both (EEG and eye-movement) chains, since EEG patterns associated to cognitive processes occur lately with respect to eye-movement phases. Moreover, EEGs and scanpaths were recorded with different time resolutions, so that some resampling scheme must be added into the model, for the sake of synchronizing both processes.

**Joint work with:**
Christophe Godin (Inria, Virtual Plants)
and Romain Azais (Inria BIGS)

In a previous work , a method to compress tree structures and to quantify their degree of self-nestedness was developed. This method is based on the detection of isomorphic subtrees in a given tree and on the construction of a DAG (Directed Acyclic Graph), equivalent to the original tree, where a given subtree class is represented only once (compression is based on the suppression of structural redundancies in the original tree). In the lossless compressed graph, every node representing a particular subtree in the original tree has exactly the same height as its corresponding node in the original tree. A lossy version of the algorithm consists in coding the nearest self-nested tree embedded in the initial tree. Indeed, finding the nearest self-nested tree of a structure without more assumptions is conjectured to be an NP-complete or NP-hard problem. We improved this lossy compression method by computing a self-nested reduction of a tree that better approximates the initial tree. The algorithm has polynomial time complexity for trees with bounded outdegree. This approximation relies on an indel edit distance that allows (recursive) insertion and deletion of leaf vertices only. We showed in a conference paper accepted at DCC'2016 with a simulated dataset that the error rate of this lossy compression method is always better than the loss based on the nearest embedded self-nestedness tree while the compression rates are equivalent. This procedure is also a keystone in our new topological clustering algorithm for trees. In addition, we obtained new theoretical results on the combinatorics of self-nested structures. The redaction of an article is currently in progress.

**Joint work with:** Philippe Ciuciu from Team Parietal and
Neurospin, CEA Saclay.

Functional Arterial Spin Labeling (fASL) MRI can provide a quantitative measurement of changes of cerebral blood flow induced by stimulation or task performance. fASL data is commonly analysed using a general linear model (GLM) with regressors based on the canonical hemodynamic response function. In this work , we consider instead a joint detection-estimation (JDE) framework which has the advantage of allowing the extraction of both task-related perfusion and hemodynamic responses not restricted to canonical shapes. Previous JDE attempts for ASL have been based on computer intensive sampling (MCMC) methods. Our contribution is to provide a comparison with an alternative variational expectation-maximization (VEM) algorithm on synthetic and real data. Other investigations were related to the use of appropriate physiological information and priors , .

**Joint work with:** Jan Warnking from Grenoble Institute of Neuroscience.

Physiological and biophysical models have been proposed to link neuronal activity to the Blood Oxygen Level-Dependent (BOLD) signal in functional MRI (fMRI). Those models rely on a set of parameter values that cannot always be extracted from the literature. In some applications, interesting insight into the brain physiology or physiopathology can be gained from an estimation of the model parameters from measured BOLD signals. This estimation is challenging because there are more than 10 potentially interesting parameters involved in nonlinear equations and whose interactions may result in identifiability issues. However, the availability of statistical prior knowledge about these parameters can greatly simplify the estimation task. In this work we focus on the extended Balloon model and propose the estimation of 15 parameters using two stochastic approaches: an Evolutionary Computation global search method called Differential Evolution (DE) and a Markov Chain Monte Carlo version of DE. To combine both the ability to escape local optima and to incorporate prior knowledge, we derive the target function from Bayesian modeling. The general behavior of these algorithms is analyzed and compared with the *de facto* standard Expectation Maximization Gauss-Newton (EM/GN) approach, providing very promising results on challenging real and synthetic fMRI data sets involving rats with epileptic activity. These stochastic optimizers provided a better performance than EM/GN in terms of distance to the ground truth in 4 out of 6 synthetic data sets and a better signal fitting in 12 out of 12 real data sets. Non-parametric statistical tests showed the existence of statistically significant differences between the real data results obtained by DE and EM/GN. Finally, the estimates obtained from DE for these parameters seem both more realistic and more stable or at least as stable across sessions as the estimates from EM/GN. This work will appear in . A preliminary version has also been accepted at the conference MICCAI 2015 .

**Joint work with:** Lotfi Chaari, Mohanad Albughdadi, Jean-Yves Tourneret from IRIT-ENSEEIHT in Toulouse and Philippe Ciuciu from Neurospin, CEA Saclay.

fMRI experiments are usually conducted over a population of interest for investigating brain activity across different regions, stimuli and subjects. Multi-subject analysis usually proceeds in two steps: an intra-subject analysis is performed sequentially on each individual and then a group-level analysis is carried out to report significant results at the population level. This work considers an existing Joint Parcellation Detection Estimation (JPDE) model which performs joint hemodynamic parcellation, brain dynamics estimation and evoked activity detection. The hierarchy of the JPDE model is extended for multi-subject analysis in order to perform group-level parcellation. Then, the corresponding underlying dynamics is estimated in each parcel while the detection and estimation steps are iterated over each individual. Validation on synthetic and real fMRI data shows its robustness in inferring group-level parcellation and the corresponding hemodynamic profiles. This work has been accepted at ISBI 2016.

**Joint work with:** Emmanuel Barbier and Benjamin Lemasson from Grenoble Institute of Neuroscience.

Advanced statistical clustering approaches are promising tools to better exploit the wealth of MRI information especially on large cohorts and multi-center studies. In neuro-oncology, the use of multiparametric MRI may better characterize brain tumor heterogeneity. To fully exploit multiparametric MRI (e.g. tumor classification), appropriate analysis methods are yet to be developed. They offer improved data quality control by allowing automatic outlier detection and improved analysis by identifying discriminative tumor signatures with measurable predictive power. In this work, we show on small animals data that advanced statistical learning approaches can help 1) in organizing existing data by detecting and excluding outliers and 2) in building a dictionary of tumor fingerprints from a clustering analysis of their microvascular features. The work also now includes the integration in a joint statistical model of both automatic ROI delineation and clustering for whole brain data analysis. A preliminary version of this work has been accepted to the ISMRM 2015 conference and in the SFMRMB 2015 conference .

**Joint work with:** Michel Dojat from Grenoble Institute of Neuroscience and Senan Doyle from Pixyl.

The goal of P. Previtero's internship was to help with a number of software engineering tasks and communications actions around the P-Locus software and the Pixyl start-up. The internship resulted in particular in a new web site for Pixyl.

F. Forbes is the principal investigator for MISTIS of the 2 year project WIFUZ on *WIreless multi sensors FUSion*. The project is supported by DGA and led by the ACOEM company
http://

F. Forbes and S. Girard are the advisors of a starting CIFRE PhD (T. Rahier) with Schneider Electric. The other advisor is S. Marié from Schneider Electric. The goal is to develop specific data mining techniques able to merge and to take advantage of both structured and unstructured (meta)data collected by a wide variety of Schneider Electric sensors to improve the quality of insights that can be produced. The total financial support for MISTIS will be of 165 keuros.

S. Girard is the advisor of a starting PhD (A. Clement) with EDF. The goal is to investigate sensitivity analysis and extrapolation limits in Extreme value theory with application to river flows analysis.

**UAC XEROX INDIA (2014-2017).** F. Forbes is co-principal investigator with R. Horaud (PERCEPTION) of a Xerox Foundation University Affairs Committee (UAC) collaborative grant *Advanced and Scalable Graph Signal Processing Techniques*, in collaboration with Arijit Biswas and Anirban Mondal, research scientists at Xerox Research Center India (XRCI) Bangalore. This collaboration is an opportunity to launch a joint research program with a Xerox Indian team. We plan to investigate robust mixture models and techniques to deal with graphical data.
Xerox Foundation funding: 80 keuros.

**PERSYVACT projects.**

mistis is involved in the 3-year project-team Oculo Nimbus, funded (250 keuros for the whole project) by the PERSYVAL labex (https://

mistis is also invollved in another action (2015-2018) recently granted Persyvact2 action supported by the Persyval Labex for 3.5 years. This project is a follow-up of the Persyvact Exploratory labex project. Persyvact2 consists of about 20 researchers from different laboratories, GIPSA-lab, LJK and TIMC-IMAG and different fields related to data science (statistics, machine learning, image and signal processing). Our contribution and involvement will lie essentially in a Graph signal processing work package with application in neuroscience for which we are planning to hire a PhD student with S. Achard (GIPSA-Lab). Persyvact2 also intends to organize scientific events and an international workshop during its lifetime. Persyvact2 will contribute, with other teams of Persyval, to enhance the international visibility of data science in Grenoble. The financial support for the consortium is of 250 keuros.

**Grenoble Pole Cognition (2013-15).** We received in 2015 2.5 keuros from the Grenoble Pole Cogntion, http://

mistis participates in the weekly statistical seminar of Grenoble. Jean-Baptiste Durand is in charge of the organization and several lecturers have been invited in this context.

**Defi Imag'IN MultiPlanNet (2015-2016).** This is a 2-year project to build a network for the analysis and fusion of multimodal data from planetology. There are 8 partners: IRCCYN Nantes, GIPSA-lab Grenoble, IPAG Grenoble, CEA Saclay,
UPS Toulouse, LGL Lyon1, GEOPS University Orsay and Inria Mistis. F. Forbes is in charge of one work package entitled *Massive inversion of multimodal data*. Our contribution will be based on our previous work in the VAHINE project on hyperspectral images and recent developments on inverse regression methods made in the HUMAVIPS project. The CNRS support for the network is of 20 keuros.

**Apprentissage, opTimisation à Large-échelle et cAlcul diStribué (ATLAS). ** Mistis is participating to this action supported by the GDR in 2016 (3 keuros).

**MSTGA and AIGM INRA (French National Institute for Agricultural Research) networks:** F. Forbes is a member of the INRA network called AIGM (ex MSTGA) network since 2006, http://

**European H2020 RESSTORE (2015-2018).** F. Forbes is involved in this multi-center Stroke European H2020 project including 20 partners.
F. Forbes will contribute through the Pixyl startup which will receive 70 keuros as a subcontractor.
RESSTORE stands for REgenerative Stem cell therapy for STroke in Europe. It is part of the Clinical research on regenerative medicine program.
It will involve a phase 2 trial with 300 patients imaged at 4 time points over a 3 year timeframe. Pixyl will provide automatic stroke lesion segmentations.

**LIRIMA**

Associate Team involved in the International Lab:

Title: Statistical Inference for the Management of Extreme Risks and Global Epidemiology

International Partner (Institution - Laboratory - Researcher):

UGB (Senegal) - LERSTAD - Abdou Kâ Diongue

Start year: 2015

See also: http://

The objective of the associate team is to federate some researchers from LERSTAD (Laboratoire d’Etudes et de Recherches en Statistiques et Développement, Université Gaston Berger) and Mistis (Inria Grenoble Rhône-Alpes). The associate team will consolidate the existing collaborations between these two laboratories. Since 2010, the collaborations have been achieved through the co-advising of two PhD theses.They have led to three publications in international journals. The associate team will also involve statisticians from EQUIPPE laboratory (Economie QUantitative Intégration Politiques Publiques Econométrie, Université de Lille) and associated members of Modal (Inria Lille Nord-Europe) as well as an epidemiologist from IRD (Institut de Recherche pour le Développement) at Dakar. We aim at developing two research themes: 1) Spatial extremes with application to management of extreme risks and 2) Classification with application to global epidemiology.

The context of our research is also the collaboration
between mistis and a number of international partners such
as the Statistics Department of University of Washington in
Seattle, the Russian Academy of Science in Moscow, and more recent partners like IDIAP involved in the past HUMAVIPS project, Université Gaston Berger in Senegal and Universities of Melbourne and Brisbane in Australia.
We also work at turning other current European contacts, *e.g.* at EPFL (A. Roche at University Hospital Lausanne and Siemens Healthcare), into more formal partnerships.

The main international collaborations that we are currently trying to develop are with:

Fabrizio Durante, Free University of Bozen-Bolzano, Italy.

K. Qin and D. Wraith from RMIT in Melbourne, Australia and Queensland University of Technology in Brisbane, Australia.

E. Deme and S. Sylla from Gaston Berger university and IRD in Senegal.

Alexandre Nazin and Russian Academy of Science in Moscow, Russia.

Alexis Roche and University Hospital Lausanne/Siemens Healthcare, Advanced Clinical Imaging Technology group, Lausanne, Switzerland.

Seydou Nourou Sylla (Université Gaston Berger, Sénégal) has been hosted by the mistis team for four months.

El Hadji Deme has been hosted by the mistis team for 3 weeks.

Abdelhakim Necir (University Biskra, Algeria) has been hosted for 2 weeks.

Sebastian Torres Leiva (Master, from Feb 2015 until June 2015)

Subject: Extreme value modelling of some glacial processes in Chilean Andes.

Institution: UTFSM - Universidad Tecnica Federico Santa Maria, Valparaiso, Chile

**IEEE DSAA 2015, IEEE international conference on Data Science and Advanced Analytics**, http://

Stéphane Girard was co-chair of the Astrostatistics summer school dedicated to “Classification & Clustering” held in Les Houches,
http://

**2nd conference of the SFRMBM society (Société Francaise de
Résonance Magnétique en Biologie et Médecine)**, http://

Stéphane Girard was a member of the organizing committees of
“4èmes rencontres R”
http://

**INRA AIGM network day** in Grenoble June 30, 2015. F. Forbes was program co-chair with N. Peyrard. 20 participants. Approximate funding: 1 keuros. AIGM events web site:
https://

Stéphane Girard was a member of the conference program committee of the "Mathematical Finance and Actuarial Sciences conference organized by the AIMS (African Institute for Mathematical Sciences), Mbour, Sénégal.

In 2015, F. Forbes has been reviewer for NIPS 2015 and GRETSI 2015.

Stéphane Girard is Associate Editor of the *Statistics and Computing* journal since 2012.
He is also member of the Advisory Board of the *Dependence Modelling* journal since december 2014.

F. Forbes is Associate Editor of the journal Frontiers in ICT: Computer Image Analysis since its creation in Sept. 2014. Computer Image Analysis is a new specialty section in the community-run openaccess journal Frontiers in ICT. This section is led by Specialty Chief Editors Drs Christian Barillot and Patrick Bouthemy.

In 2015, S. Girard has been a reviewer for *Scandinavian Journal of Statistics, Extremes* and *Journal of Statistical Software.*

In 2015, F. Forbes has been reviewer for *Statistics and Computing*, *Computational Statistics and Data Analysis* journals, IEEE trans. on Signal Processing, IEEE trans. on Image Processing journals.

Stéphane Girard was invited to give a talk at the SMAI Conference and at the Extreme Value Analysis conference .

F. Forbes was invited to give a talk at the Working Group on Model-Based Clustering Summer Session, in Seattle, USA, July 19-25, 2015, http://

F. Forbes was invited to give a talk at the CHUV/SIEMENS workshop on Quantitative magnetic resonance imaging for neuroradiology, in Lausanne, Switzerland, June 29, 2015. Title: Automatic brain lesion segmentation: methodological challenges. 20 participants.

F. Forbes and S. Girard gave a tutorial at the Astrostatistics School in Les Houches in Oct. 2015, http://

Stéphane Girard is at the head of the associate team (*Statistical Inference for the Management of Extreme
Risks and Global Epidemiology*) created in 2015 between Mistis
and LERSTAD (Université Gaston Berger, Saint-Louis, Sénégal). The team is part of the LIRIMA
(Laboratoire International de Recherche en Informatique et Mathématiques Appliquées),
http://

Stéphane Girard was in charge of evaluating PEPS research projects (projets exploratoires premier soutien) for the CNRS and MITACS projects from Québec, Canada.

F. Forbes is a member of the ERCIM working group on Mixture models.

Stéphane Girard is at the head of the Probability and Statistics department of the LJK (Laboratoire Jean Kuntzmann) since september 2012.

Grenoble Pole Cognition. F. Forbes is representing Inria and LJK in the pole.

PRIMES Labex, Lyon. F. Forbes is a member of the strategic committee. F. Forbes is representing Inria.

F. Forbes was elected in 2010 and is since then a member of the bureau of
the “Statistics and Images” group in
the Société Française de Statistique (SFdS), http://

Licence : Alexis Arnaud, *Probability and statistics*, 56 ETD, L2 level, IUT2 Grenoble, Université Pierre Mendès France.

Master : Stéphane Girard, *Statistique Inférentielle Avancée*, 45 ETD, M1 level, Ensimag.
Grenoble-INP, France.

Master: Jean-Baptiste Durand, *Statistics and probability*, 192 ETD, M1 and M2 levels, Ensimag Grenoble INP, France.

J.-B. Durand is a faculty member at Ensimag, Grenoble INP.

PhD in progress: Aina Frau-Pascual, “*Statistical Models for the coupling of ASL and BOLD Magnetic Resonance modalities to study brain function and disease*”, October 2013, Florence Forbes and Philippe Ciuciu.

PhD in progress: Alexis Arnaud “*Multiparametric MRI statistical analysis for the identification and follow-up of brain tumors *”, October 2014, Florence Forbes and Emmanuel Barbier.

PhD in progress: Pierre-Antoine Rodesch, “*Spectral tomography and tomographic reconstruction algorithms*”, october 2015, Florence Forbes.

PhD in progress: Thibaud Rahier, “*Data-mining pour la fusion de données structurées et
non-structurées*”, november 2015, Florence Forbes and Stéphane Girard.

PhD in progress: Clément Albert, “*Limites de crédibilité d'extrapolation des lois de
valeurs extrêmes*”, october 2015, Stéphane Girard.

PhD in progress: Maïlys Lopes, “*Télédétection en écologie du paysage : statistiques en grande
dimension pour la multirésolution spatiale et la haute résolution temporelle*”, november 2014, Stéphane Girard
and Mathieu Fauvel (INRA Toulouse).

PhD in progress: Alessandro Chiancone, “*Sequential dimension reduction*”, november 2013, Stéphane Girard
and Jocelyn Chanussot (Grenoble INP).

PhD in progress: Seydou Nourou Sylla, “*Modélisation statistique pour l'analyse des causes de décès
décrites par autopsie verbale en milieu rural africain : cas du Sénégal*”, october 2012, Stéphane
Girard and Abdou Diongue (Université Gaston Berger, Sénégal).

PhD in progress: Brice Olivier, “*Joint analysis of eye-movements and EEGs using coupled hidden Markov and topic models*”, october 2015, Jean-Baptiste Durand, Marianne Clausel and Anne Guérin-Dugué (Université Grenoble Alpes).

F. Forbes has been reviewer for four PhD theses in 2015:

- Theodosios Gkamas, University of Strasbourg, Sept. 29, 2015.

- Brice Ozenne, University of Lyon 1, Sept, 2015.

- Julien Stoehr, University Montpellier 2, Oct. 2015.

- Hajer Braham, Telecom ParisTech and Orange, December 2015.

F. Forbes was also president for the PhD commitee of Haithem Boussaid, "Efficient Inference and learning in Graphical models for multi-organ shape segmentation", Ecole Centrale Paris, January 8, 2015.

Stéphane Girard has been rewiewer of two PhD theses:

- “*Modélisation de la dépendance et estimation du risque agrégé*”, by Andrés Cuberos, Univ. Claude
Bernard - Lyon and

- “*Analyse de données de cytométrie de flux pour un grand nombre d'échantillons*” by Xiaoyi Chen, Univ.
Cergy-Pontoise.

S. Girard was also a member of the PhD committee of Jonathan Jalbert
“*Développement d'un modèle statistique non stationnaire et régional pour les précipitations
extrêmes simulées par un modèle numérique de climat*”, Univ. Laval (Québec, Canada).

F. Forbes was also a member of the HDR commitee of Estelle Kuhn, University Paris Orsay, November 2015.

F. Forbes is a member of the Committee for technological project and engineer candidate selection at Inria Grenoble Rhône-Alpes ("Commission du développement technologique ").

F. Forbes was a member of the committee for Inria research scientist candidate (CR) selection at Inria Rennes in 2015.

F. Forbes was a member of the committee for attributing the bi-annual Jean Kuntzman award.

Stéphane Girard is a member of the "Comité des Emplois Scientifiques" and “Comité de Centre” at Inria Grenoble Rhône-Alpes since 2015.

Since 2015, Stéphane Girard is a member of the INRA committee (CSS MBIA) in charge of evaluating INRA researchers once a year in the MBIA dept of INRA.

F. Forbes gave several large audience presentations related to the Pixyl startup and more recently interviews for journals such as Presence and INSERM Science et Santé Magazines.