Section: New Results
Semi and non-parametric methods
Modelling extremal events
Participants : Stéphane Girard, Laurent Gardes.
Joint work with:Guillou, A. (Univ. Strasbourg), and Diebolt, J. (CNRS, Univ. Marne-la-vallée).
Our first achievement is the introduction of a new model of tail distributions depending on a function and on an unknown parameter [40] . This model includes very different distribution tail behaviours from the three classical maximum domains of attraction. In the particular cases of Pareto type tails or Weibull tails, our estimators coincide with classical ones proposed in the literature, thus permitting to retrieve their asymptotic normality in an unified way. Our second achievement is the development of new estimators dedicated to Weibull-tail distributions ( 3 ): kernel estimators [19] and bias correction through exponential regression [16] , [17] .
Conditional extremal events
Participants : Stéphane Girard, Laurent Gardes, Alexandre Lekina.
Joint work with:Amblard, C. (TimB in TIMC laboratory, Univ. Grenoble 1).
The goal of the PhD thesis of Alexandre Lekina is to contribute to the development of theoretical and algorithmic models to tackle conditional extreme value analysis, ie the situation where some covariate information X is recorded simultaneously with a quantity of interest Y . In such a case, the tail heaviness of Y depends on X, and thus the tail index as well as the extreme quantiles are also functions of the covariate. We combine nonparametric smoothing techniques [46] with extreme-value methods in order to obtain efficient estimators of the conditional tail index [18] and conditional extreme quantiles [41] . Conditional extremes are studied in climatology where one is interested in how climate change over years might affect extreme temperatures or rainfalls. In this case, the covariate is univariate (the time). Bivariate examples include the study of extreme rainfalls as a function of the geographical location. The application part of the study will be joint work with the LTHE (Laboratoire d'étude des Transferts en Hydrologie et Environnement) located in Grenoble.
More future work will include the study of multivariate extreme values. To this aim, a research on some particular copulas [1] , [11] has been initiated with Cécile Amblard, since they are the key tool for building multivariate distributions [52] .
Level sets estimation
Participants : Stéphane Girard, Laurent Gardes.
Joint work with:Daouia, A. (Univ. Toulouse I), Jacob, P. and Menneteau, L. (Univ. Montpellier II).
The boundary bounding the set of points is viewed as the larger level set of the points distribution. This is then an extreme quantile curve estimation problem. We propose estimators based on projection as well as on kernel regression methods applied on the extreme values set [21] , for particular set of points. Our work is to define similar methods based on wavelets expansions in order to estimate non-smooth boundaries, and on local polynomials estimators to get rid of boundary effects. Besides, we are also working on the extension of our results to more general sets of points. To this end, we focus on the family of conditional heavy tails. An estimator of the conditional tail index has been proposed [18] and the corresponding conditional extreme quantile estimator has been derived [41] . This work has been initiated in the PhD work of Laurent Gardes [48] , co-directed by Pierre Jacob and Stéphane Girard and in [22] with the consideration of star-shaped supports.
Dimension reduction
Participants : Stéphane Girard, Laurent Gardes, Caroline Bernard-Michel, Mathieu Fauvel.
To overcome the curse of dimensionality arising in high-dimensional regression problems, one way consists in reducing the problem dimension. To this end, Sliced Inverse Regression (SIR) is an interesting solution. The original method, however, requires the inversion of the predictors covariance matrix. In case of collinearity between these predictors or small sample sizes compared to the dimension, the inversion is not possible and a regularization technique has to be used. We thus develop a new approach [13] , [12] based on a Fisher Lecture given by R.D. Cook where it is shown that SIR axes can be interpreted as solutions of an inverse regression problem. In this paper, a Gaussian prior distribution is introduced on the unknown parameters of the inverse regression problem in order to regularize their estimation. We show that some existing SIR regularizations can enter our framework, which permits a global understanding of these methods. Three new priors are proposed leading to new regularizations of the SIR method.
This technique has been applied in particular in a collaboration with bioMerieux (see Section 7.1 ). We co-advised the internship of Lamiae Azizi who applied SIR in the context of quantitation procedures developed at bioMerieux.
Nuclear plants reliability
Participants : Laurent Gardes, Stéphane Girard.
Joint work with:Perot, N., Devictor, N. and Marquès, M. (CEA).
One of the main activities of the LCFR (Laboratoire de Conduite et Fiabilité des Réacteurs), CEA Cadarache, concerns the probabilistic analysis of some processes using reliability and statistical methods. In this context, probabilistic modelling of steels tenacity in nuclear plants tanks has been developed. The databases under consideration include hundreds of data indexed by temperature, so that, reliable probabilistic models have been obtained for the central part of the distribution. However, in this reliability problem, the key point is to investigate the behaviour of the model in the distribution tail. In particular, we are mainly interested in studying the lowest tenacities when the temperature varies (Figure 4 ).
This work is supported by a research contract (from december 2008 to december 2010) involving mistis and the LCFR.
Quantifying uncertainties on extreme rainfall estimations
Participants : Caroline Bernard-Michel, Laurent Gardes, Stéphane Girard.
Joint work with:Molinié, G. from Laboratoire d'Etude des Transferts en Hydrologie et Environnement (LTHE), France.
Extreme rainfalls are generally associated with two different precipitation regimes. Extreme cumulated rainfall over 24 hours results from stratiform clouds on which the relief forcing is of primary importance. Extreme rainfall rates are defined as rainfall rates with low probability of occurrence, typically with higher mean return-levels than the maximum observed level. For example Figure 5 presents the return levels for the Cévennes-Vivarais region. It is then of primary importance to study the sensitivity of the extreme rainfall estimation to the estimation method considered. A preliminary work on this topic is available in [25] . mistis got a Ministry grant for a related ANR project (see Section 8.2 ).
Retrieval of Mars surface physical properties from OMEGA hyperspectral images using Regularized Sliced Inverse Regression.
Participants : Caroline Bernard-Michel, Mathieu Fauvel, Laurent Gardes, Stéphane Girard.
Joint work with:Douté, S. from Laboratoire de Planétologie de Grenoble, France in the context of the VAHINE project (see Section 8.2 ).
Visible and near infrared imaging spectroscopy is one of the key techniques to detect, to map and to characterize mineral and volatile (eg. water-ice) species existing at the surface of the planets. Indeed the chemical composition, granularity, texture, physical state, etc. of the materials determine the existence and morphology of the absorption bands. The resulting spectra contain therefore very useful information. Current imaging spectrometers provide data organized as three dimensional hyperspectral images: two spatial dimensions and one spectral dimension. Our goal is to estimate the functional relationship F between some observed spectra and some physical parameters. To this end, a database of synthetic spectra is generated by a physical radiative transfer model and used to estimate F . The high dimension of spectra is reduced by Gaussian regularized sliced inverse regression (GRSIR) to overcome the curse of dimensionality and consequently the sensitivity of the inversion to noise (ill-conditioned problems). This method is compared with the more classical SVM approach. GRSIR has the advantage of being very fast, interpretable and accurate. Recall that SVM approximates the functional F : y= F( x) using a solution of the form , where x_{i} are samples from the training set, K a kernel function and are the parameters of F which are estimated during the training process. The kernel K is used to produce a non-linear function. The SVM training entails minimization of with respect to , and with if | F( x)- y| and | F( x)- y|- otherwise. Prior to running the algorithm, the following parameters need to be fitted: which controls the resolution of the estimation, which controls the smoothness of the solution and the kernel parameters ( for the Gaussian kernel).
Statistical analysis of hyperspectral multi-angular data from Mars
Participants : Caroline Bernard-Michel, Mathieu Fauvel, Florence Forbes, Laurent Gardes, Stéphane Girard.
Joint work with:Douté, S. from Laboratoire de Planétologie de Grenoble, France in the context of the VAHINE project (see Section 8.2 ).
A new generation of imaging spectrometers is emerging with an additional angular dimension, in addition to the three usual dimensions, two spatial dimensions and one spectral dimension. The surface of the planets will now be observed from different view points on the satellite trajectory, corresponding to about ten different angles, instead of only one corresponding usually to the vertical (0 degree angle) view point. Multi-angle imaging spectrometers present several advantages: the influence of the atmosphere on the signal can be better identified and separated from the surface signal on focus, the shape and size of the surface components and the surfaces granularity can be better characterized. However, this new generation of spectrometers also results in a significant increase in the size (several tera-bits expected) and complexity of the generated data. We started to investigate the use of statistical techniques to deal with these generic sources of complexity in data beyond the traditional tools in mainstream statistical packages.
Preliminary experiments carried out by Camille Neels during her 2 month internship in the team pointed out that, previous to any classification task or other analyses, some pre-processing of the images was required. We pointed out the existence in the data of a so-called spectral smile issue which we are currently trying to correct. Spectral smile refers to an artefact commonly encountered in spectral images acquired with Push-broom spectrometers. It is due to the fact that the wavelength-channel association is not constant across the spatial dimension. Regarding classification tasks, it induces artificial inhomogeneities due to sampling issues.