## Section: New Results

### Supervised statistical inference: regression and classification

Participants : Gérard Biau, Olivier Catoni.

Least square regression with random design is a central issue in
supervised statistical inference. The team, in collaboration
with Willow, was able in [18]
to show on the one hand that the ordinary least square estimator
has an asymptotic rate optimal behaviour proportional to d/n ,
where d is the dimension and n the sample size, under very
weak assumptions (existence of a quadratic moment for the noise
and of a fourth moment for the design, without any assumptions on
the conditioning of the Gram matrix). Moreover, this result can
be extended to ridge regression, the dimension being replaced with
some lower *effective ridge dimension*. However, under such
hypotheses, this asymptotic regime can be reached arbitrarily
slowly. To obtain non asymptotic bounds, it is necessary to
make the estimator itself more robust. This is possible
through some min-max truncation scheme, for which it is
possible to give a non asymptotic convergence rate depending
only on the kurtosis of a few quantities. This min-max scheme
is feasible in practice, involving in experiments a load of
computations of order 50 times what is needed for the ordinary
least square estimator. Experiments also show improved performance
in comparison with the ordinary least square estimator, when
the noise is heavy tailed, and preserved performances otherwise
(where the two estimators compute de same solution).

In order to use PAC-Bayes inequalities, it is necessary to consider a perturbation of the parameter, in the form of a posterior distribution. For this reason, the theory gives sharper results [19] for randomized and quite involved estimators, defined by posterior distributions. Using this kind of estimators, it is possible to show non asymptotic range optimal rates for general loss functions under even milder dimension and margin assumptions (generalizing the notion of margin introduced by Mammen and Tsybakov).

On the other hand, the min-max truncation scheme proposed for least square estimation can be simplified in the case of mean estimation [22] , leading to a mean estimator with better deviation properties than the empirical mean estimator for heavy tailed distributions (such as the mixture of two Gaussian measures with different standard deviations).

Another direction of research to turn statistical regression into a learning tool is to find efficient ways to deal with high dimension inputs. Various aggregation and dimension reductions methods have been studied within the team—among which random forests, which we discuss below, and PCA-Kernel estimation, which we discuss now. Indeed, many statistical estimation techniques for high-dimensional or functional data are based on a preliminary dimension reduction step, which consists in projecting the sample onto the first D eigenvectors of the Principal Component Analysis (PCA) associated with the empirical projector . Classical nonparametric inference methods such as kernel density estimation or kernel regression analysis are then performed in the (usually small) D -dimensional space. However, the mathematical analysis of this data-driven dimension reduction scheme raises technical problems, due to the fact that the random variables of the projected sample are no more independent. As a reference for further studies, we offer in the paper [21] several results showing the asymptotic equivalencies between important kernel-related quantities based on the empirical projector and its theoretical counterpart.