Team select

Members
Overall Objectives
Scientific Foundations
Application Domains
Software
New Results
Contracts and Grants with Industry
Other Grants and Activities
Dissemination
Bibliography

Section: New Results

Selection of high dimensional graphical models

Participants : Pascal Massart, Nicolas Verzelen.

The last decade has witnessed the apparition of applied problems typified by very high-dimensional variables (in marketing database or gene expression studies for instance). Graphical models enable concise representations of associational relations between variables. If the graph is known, the parameters of the model are easily estimated. However, a quite challenging issue is the selection of the most appropriate graph for a given data set.

Sylvie Huet (INRA), Pascal Massart, Nicolas Verzelen, and Fanny Villers (Université Paris 6) [24] defined a goodness-of-fit test of linear hypotheses for Gaussian regression with Gaussian covariates. They deduced from it a test for Gaussian graphical models which applies in a high dimensional setting. Besides, it is shown to be minimax against various alternatives. They have also carried out numerical experiments with microarray genetic data and have assessed the graph of genetic networks [25] .

Graph selection of Gaussian graphical models is closely related to the estimation in the linear regression model with Gaussian covariates. In this setting, Nicolas Verzelen [23] has introduced a novel estimation method based on penalization ideas. This procedure is proved to satisfy a non-asymptotic oracle inequality and adaptation properties. Contrary to other methods such as the lasso, the rates of convergence do not depend on the correlation between the covariates.

Verzelen's procedure [23] allows to tackle graph selection. However, its computational cost becomes prohibitive when the size of the graph increases. To handle this drawback, Christophe Giraud (École Polytechnique), Sylvie Huet (INRA), and Nicolas Verzelen propose a two-stage procedure which first builds a family of candidate graphs from the data and then selects one graph among this family according to a dedicated criterion [65] . This estimation procedure is shown to be consistent in a high-dimensional setting and its risk is controlled by a non-asymptotic oracle-like inequality. A nice behavior on numerical experiments corroborates these theoretical results. The procedure is implemented in the R-package GGMselect available on the CRAN.


previous
next

Logo Inria