## Section: New Results

### Semi and non-parametric methods

#### Modelling extremal events

Participants : Stéphane Girard, Laurent Gardes.

**Joint work with:**
Guillou, A. (Univ. Strasbourg)

We introduced of a new model of tail distributions depending on two parameters [0, 1] and >0 [55] . This model includes very different distribution tail behaviors from Fréchet et Gumbel maximum domains of attraction. In the particular cases of Pareto type tails ( = 1 ) or Weibull tails ( = 0 ), our estimators coincide with classical ones proposed in the literature, thus permitting to retrieve their asymptotic normality in an unified way.

#### Conditional extremal events

Participants : Stéphane Girard, Laurent Gardes, Alexandre Lekina, Eugen Ursu.

**Joint work with:** Amblard, C. (TimB in TIMC laboratory,
Univ. Grenoble 1).

The goal of the PhD thesis of Alexandre Lekina
is to contribute to the development of
theoretical and algorithmic models to tackle conditional extreme
value analysis,
*ie* the situation where some covariate information X is
recorded simultaneously with a quantity of interest Y .
In such a case, the tail heaviness of Y depends on X,
and thus the tail index as well as the extreme quantiles are
also functions of the covariate.
We combine
nonparametric smoothing techniques [51] with extreme-value methods in
order to obtain efficient estimators of the conditional tail index [9] and conditional extreme quantiles [56] .
Conditional extremes are studied in climatology where one is
interested in how climate change over years might affect extreme
temperatures or rainfalls. In this case, the covariate is univariate
(the time). Bivariate examples include the study of extreme
rainfalls as a function of the geographical location.
The application part of the study is joint work with the LTHE
(Laboratoire d'étude des Transferts en Hydrologie et Environnement)
located in Grenoble. The obtained results are submitted for publication [54] .

More future work will include the study of multivariate and spatial extreme values. To this aim, a research on some particular copulas [1] , [11] has been initiated with Cécile Amblard, since they are the key tool for building multivariate distributions [61] .

#### Level sets estimation

Participants : Stéphane Girard, Laurent Gardes.

**Joint work with:** Daouia, A. (Univ. Toulouse I) and Jacob, P. (Univ. Montpellier II).

The boundary bounding the set of points is viewed as the larger level set of the points distribution. This is then an extreme quantile curve estimation problem. We propose estimators based on projection as well as on kernel regression methods applied on the extreme values set, for particular set of points. Our work is to define similar methods based on wavelets expansions in order to estimate non-smooth boundaries, and on local polynomials [17] estimators to get rid of boundary effects. Besides, we are also working on the extension of our results to more general sets of points. To this end, we focus on the family of conditional heavy tails. An estimator of the conditional tail index has been proposed [9] and the corresponding conditional extreme quantile estimator has been derived [56] in a fixed design setting. The extension to the random design framework is investigated in [49] . This work has been initiated in the PhD work of Laurent Gardes [53] , co-directed by Pierre Jacob and Stéphane Girard.

#### Dimension reduction

Participants : Stéphane Girard, Laurent Gardes, Mathieu Fauvel.

To overcome the curse of dimensionality arising in high-dimensional regression problems, one way consists in reducing the problem dimension. To this end, Sliced Inverse Regression (SIR) is an interesting solution. The original method, however, requires the inversion of the predictors covariance matrix. In case of collinearity between these predictors or small sample sizes compared to the dimension, the inversion is not possible and a regularization technique has to be used. We thus develop a new approach [13] based on a Fisher Lecture given by R.D. Cook where it is shown that SIR axes can be interpreted as solutions of an inverse regression problem. In this paper, a Gaussian prior distribution is introduced on the unknown parameters of the inverse regression problem in order to regularize their estimation. We show that some existing SIR regularizations can enter our framework, which permits a global understanding of these methods. Three new priors are proposed leading to new regularizations of the SIR method. Results are compared with the Support Vector Machine (SVM) approach on hyperspectral data [12] .

#### Nuclear plants reliability

Participants : Laurent Gardes, Stéphane Girard.

**Joint work with:** Perot, N.,
Devictor, N. and Marquès, M. (CEA).

One of the main activities of the LCFR (Laboratoire de Conduite et Fiabilité des Réacteurs), CEA Cadarache, concerns the probabilistic analysis of some processes using reliability and statistical methods. In this context, probabilistic modelling of steels tenacity in nuclear plants tanks has been developed. The databases under consideration include hundreds of data indexed by temperature, so that, reliable probabilistic models have been obtained for the central part of the distribution. However, in this reliability problem, the key point is to investigate the behavior of the model in the distribution tail. In particular, we are mainly interested in studying the lowest tenacities when the temperature varies (Figure 7 ).

This work is supported by a research contract (from December 2008 to December 2010) involving mistis and the LCFR.

#### Quantifying uncertainties on extreme rainfall estimations

Participants : Eugen Ursu, Laurent Gardes, Stéphane Girard.

**Joint work with:** Molinié, G. from Laboratoire
d'Etude des Transferts en Hydrologie et Environnement (LTHE), France.

Extreme rainfalls are generally associated with two different precipitation regimes. Extreme cumulated rainfall over 24 hours results from stratiform clouds on which the relief forcing is of primary importance. Extreme rainfall rates are defined as rainfall rates with low probability of occurrence, typically with higher mean return-levels than the maximum observed level. For example Figure 8 presents the return levels for the Cévennes-Vivarais region. It is then of primary importance to study the sensitivity of the extreme rainfall estimation to the estimation method considered. A preliminary work on this topic is available in [54] . mistis got a Ministry grant for a related ANR project (see Section 8.2 ).

#### Retrieval of Mars surface physical properties from OMEGA hyperspectral images.

Participants : Mathieu Fauvel, Laurent Gardes, Stéphane Girard.

**Joint work with:** Douté, S. from Laboratoire de
Planétologie de Grenoble, France in the context of the VAHINE
project (see Section
8.2 ).

Visible and near infrared imaging spectroscopy is
one of the key techniques
to detect, to map and to characterize mineral and volatile (eg.
water-ice)
species existing at
the surface of the planets. Indeed the chemical composition,
granularity, texture, physical state, etc. of the materials
determine the existence and morphology of the absorption bands.
The resulting spectra contain therefore very useful information.
Current imaging spectrometers provide data organized as three
dimensional hyperspectral images: two spatial dimensions and one
spectral dimension.
Our goal is to estimate the functional relationship F between some observed spectra and some physical parameters. To this end, a database of synthetic spectra
is generated by a physical radiative transfer model and used to
estimate F . The high dimension of spectra is reduced by Gaussian
regularized sliced inverse regression (GRSIR) to overcome the curse
of dimensionality and consequently the sensitivity of the inversion
to noise (ill-conditioned problems). This method is compared with the more classical SVM approach. GRSIR has the advantage of being very fast, interpretable and accurate [12] .
Recall that SVM approximates the functional F : y = F(x) using a solution of the
form , where x_{i} are
samples from the training set, K a kernel function and
are the parameters of F which
are estimated during the training
process. The kernel K is used to
produce a non-linear function. The SVM training
entails minimization of with respect to
, and with if |F(x)-y| and |F(x)-y|- otherwise.
Prior to running the algorithm, the following parameters need to be
fitted: which controls the resolution of the estimation,
which controls the smoothness of the solution and the kernel
parameters ( for the Gaussian kernel).

#### Statistical analysis of hyperspectral multi-angular data from Mars

Participants : Mathieu Fauvel, Florence Forbes, Laurent Gardes, Stéphane Girard.

**Joint work with:** Douté, S. from Laboratoire de
Planétologie de Grenoble, France in the context of the VAHINE
project (see Section
8.2 ).

A new generation of imaging spectrometers is emerging with an additional angular dimension, in addition to the three usual dimensions, two spatial dimensions and one spectral dimension. The surface of the planets will now be observed from different view points on the satellite trajectory, corresponding to about ten different angles, instead of only one corresponding usually to the vertical (0 degree angle) view point. Multi-angle imaging spectrometers present several advantages: the influence of the atmosphere on the signal can be better identified and separated from the surface signal on focus, the shape and size of the surface components and the surfaces granularity can be better characterized. However, this new generation of spectrometers also results in a significant increase in the size (several tera-bits expected) and complexity of the generated data. To investigate the use of statistical techniques to deal with these generic sources of complexity, we made preliminary experiments using our HDDC technique on a first set of realistic synthetic 4D spectral data provided by our collaborators from LPG. It appeared that this data set was not relevant for our study due to the fact that the simulated angular information provided was not discriminant and could not allow us to draw useful conclusions. Further experiments on other data sets are then necessary.