## Section: Scientific Foundations

### Regression models of supervised learning

The most obvious contribution of statistics to machine learning is to
consider the supervised learning scenario as a special case of regression
estimation: given n independent pairs
of observations (X_{i}, Y_{i}) , , the aim is to
“learn” the dependence of Y_{i} on X_{i} . Thus, classical results
about statistical regression estimation apply, with the caveat that
the hypotheses we can reasonably assume about the distribution
of the pairs (X_{i}, Y_{i}) are much weaker than what is usually
considered in statistical studies. The aim here is to assume very
little, maybe only independence of the observed sequence of input-output
pairs, and to validate model and variable selection schemes.
These schemes should produce the best possible approximation of the
joint distribution of (X_{i}, Y_{i}) within some restricted family
of models. Their performance is evaluated according
to some measure of discrepancy between
distributions, a standard choice being to use the Kullback-Leibler
divergence.

One of the specialties of the team in this direction is to use PAC-Bayes inequalities to combine thresholded exponential moment inequalities. The name of this theory comes from its founder, David McAllester, and may be misleading. Indeed, its cornerstone is rather made of non-asymptotic entropy inequalities, and a perturbative approach to parameter estimation. The team has made major contributions to the theory, first focussed on classification [5] , then on regression (see the papers [18] , [19] discussed below). It has introduced the idea of combining the PAC-Bayesian approach with the use of thresholded exponential moments, in order to derive bounds under very weak assumptions on the noise.

Another line of research in regression estimation is the use
of sparse models, and its link with _{1} -regularization.
Selecting a few variables from a large set of candidates
in a computationally efficient way is a major challenge
of statistical learning. Another approach to catch more general
situations, is to predict outputs in a sequential way.
If a cumulated loss is considered, this can be done even
under weaker assumptions than what is possible within
the regression framework. These two lines are described in the next
two items.