Team classic

Members
Overall Objectives
Scientific Foundations
Application Domains
New Results
Contracts and Grants with Industry
Other Grants and Activities
Dissemination
Bibliography

Section: New Results

Supervised statistical inference: regression and classification

Participants : Gérard Biau, Olivier Catoni.

Least square regression with random design is a central issue in supervised statistical inference. The team, in collaboration with Willow, was able in [18] to show on the one hand that the ordinary least square estimator has an asymptotic rate optimal behaviour proportional to d/n , where d is the dimension and n the sample size, under very weak assumptions (existence of a quadratic moment for the noise and of a fourth moment for the design, without any assumptions on the conditioning of the Gram matrix). Moreover, this result can be extended to ridge regression, the dimension being replaced with some lower effective ridge dimension. However, under such hypotheses, this asymptotic regime can be reached arbitrarily slowly. To obtain non asymptotic bounds, it is necessary to make the estimator itself more robust. This is possible through some min-max truncation scheme, for which it is possible to give a non asymptotic convergence rate depending only on the kurtosis of a few quantities. This min-max scheme is feasible in practice, involving in experiments a load of computations of order 50 times what is needed for the ordinary least square estimator. Experiments also show improved performance in comparison with the ordinary least square estimator, when the noise is heavy tailed, and preserved performances otherwise (where the two estimators compute de same solution).

In order to use PAC-Bayes inequalities, it is necessary to consider a perturbation of the parameter, in the form of a posterior distribution. For this reason, the theory gives sharper results [19] for randomized and quite involved estimators, defined by posterior distributions. Using this kind of estimators, it is possible to show non asymptotic range optimal rates for general loss functions under even milder dimension and margin assumptions (generalizing the notion of margin introduced by Mammen and Tsybakov).

On the other hand, the min-max truncation scheme proposed for least square estimation can be simplified in the case of mean estimation [22] , leading to a mean estimator with better deviation properties than the empirical mean estimator for heavy tailed distributions (such as the mixture of two Gaussian measures with different standard deviations).

Another direction of research to turn statistical regression into a learning tool is to find efficient ways to deal with high dimension inputs. Various aggregation and dimension reductions methods have been studied within the team—among which random forests, which we discuss below, and PCA-Kernel estimation, which we discuss now. Indeed, many statistical estimation techniques for high-dimensional or functional data are based on a preliminary dimension reduction step, which consists in projecting the sample Im2 ${\#119831 _1,...,\#119831 _n}$ onto the first D eigenvectors of the Principal Component Analysis (PCA) associated with the empirical projector Im3 $\mover \#928 ^_D$ . Classical nonparametric inference methods such as kernel density estimation or kernel regression analysis are then performed in the (usually small) D -dimensional space. However, the mathematical analysis of this data-driven dimension reduction scheme raises technical problems, due to the fact that the random variables of the projected sample Im4 ${(\mover \#928 ^_D\#119831 _1,...,\mover \#928 ^_D\#119831 _n)}$ are no more independent. As a reference for further studies, we offer in the paper [21] several results showing the asymptotic equivalencies between important kernel-related quantities based on the empirical projector and its theoretical counterpart.


previous
next

Logo Inria