Team classic

Overall Objectives
Scientific Foundations
Application Domains
New Results
Contracts and Grants with Industry
Other Grants and Activities

Section: New Results

Sparse regression estimation

Participants : Gérard Biau, Olivier Catoni, Sébastien Gerchinovitz, Vincent Rivoirard, Gilles Stoltz.

The paper [24] by Sébastien Gerchinovitz imports the notion of sparse oracle inequalities within the theory of individual sequences. It is based on a forecaster –SEW– that Dalalyan and Tsybakov introduced in a series of articles, the first of them being presented at COLT'07, and studied in a stochastic (i.i.d.) setting. The forecaster relies on some tuning parameters and the question of their adaptive calibration with respect in particular to the variance was left open. However, the individual sequence bounds proved on its extension naturally imply stochastic bounds; and since the individual sequence version of SEW is perfectly calibrated on-line, it solves the question left open therein. The mathematical techniques used to prove the extension are in particular a PAC-Bayesian inequality developed by Olivier Catoni and an adaptive exponentially weighted average scheme exhibited by Gilles Stoltz and co-authors of his. Sébastien Gerchinovitz also benefited from the background and some advice of Vincent Rivoirard, with respect to the stochastic scenario.

Another line of research in this context was performed by Gérard Biau in [20] , [13] and is concerned with random forests. These are a scheme proposed by Leo Breiman for building a predictor ensemble with a set of decision trees that grow in randomly selected subspaces of data. Despite growing interest and practical use, there had been little exploration of the statistical properties of random forests, and little is known about the mathematical forces driving the algorithm. In this respect he shows in particular that a variant (proposed by Breiman and his co-authors) of the base procedure of random forests is consistent and adapts to sparsity, in the sense that its rate of convergence depends only on the number of strong features and not on how many noise variables are present.

We also mention a current research line: spherical deconvolution by using Lasso-type methods (where we recall that the Lasso is the “canonical” spare forecaster in the stochastic setting).


Logo Inria