## Section: Scientific Foundations

### Regression models of supervised learning

The most obvious contribution of statistics to machine learning is to consider the supervised learning scenario as a special case of regression estimation: given n independent pairs of observations (Xi, Yi) , , the aim is to “learn” the dependence of Yi on Xi . Thus, classical results about statistical regression estimation apply, with the caveat that the hypotheses we can reasonably assume about the distribution of the pairs (Xi, Yi) are much weaker than what is usually considered in statistical studies. The aim here is to assume very little, maybe only independence of the observed sequence of input-output pairs, and to validate model and variable selection schemes. These schemes should produce the best possible approximation of the joint distribution of (Xi, Yi) within some restricted family of models. Their performance is evaluated according to some measure of discrepancy between distributions, a standard choice being to use the Kullback-Leibler divergence.

One of the specialties of the team in this direction is to use PAC-Bayes inequalities to combine thresholded exponential moment inequalities. The name of this theory comes from its founder, David McAllester, and may be misleading. Indeed, its cornerstone is rather made of non-asymptotic entropy inequalities, and a perturbative approach to parameter estimation. The team has made major contributions to the theory, first focussed on classification [5] , then on regression (see the papers [18] , [19] discussed below). It has introduced the idea of combining the PAC-Bayesian approach with the use of thresholded exponential moments, in order to derive bounds under very weak assumptions on the noise.

Another line of research in regression estimation is the use of sparse models, and its link with 1 -regularization. Selecting a few variables from a large set of candidates in a computationally efficient way is a major challenge of statistical learning. Another approach to catch more general situations, is to predict outputs in a sequential way. If a cumulated loss is considered, this can be done even under weaker assumptions than what is possible within the regression framework. These two lines are described in the next two items.

Logo Inria