Section: New Results
Quantifying Uncertainty
Propagation of uncertainties
Participants : FrançoisXavier Le Dimet, Victor Shutyaev.
Basically, geophysical models are suffering of two types of errors:

errors in the model itself due to approximations of physical processes and their subgrid parametrization and also errors linked to the necessary numerical discretization;

errors in the observation because of errors of measurements and also errors due to sampling. For instance, many remote sensings observe only radiances, which are transformed into the state variables thanks to complex processes like the resolution of an inverse problem. This is, of course, a source of errors.
Estimating the propagation of errors is an important and costly (in term of computing resources) task for two reasons:

the quality of the forecast must be estimated

the estimation of the statistics of errors has to be included in the analysis to have an adequate norm, based on these statistics, on the forecast and also on the observation.
In the variational framework, models, observations, statistics are linked into the optimality system which can be considered as a “generalized" model containing all the available estimation. The estimation of error covariances are estimated both from the second order analysis and the Hessian of the cost function. Numerical experiments have been carried out on a nonlinear model [16] . We expect to extent the numerical experiments to a semioperational model in cooperation with ECMWF.
Sensitivity analysis for West African monsoon
Participants : Anestis Antoniadis, Céline Helbert, Clémentine Prieur, Laurence Viry.
Geophysical context
The West African monsoon is the major atmospheric phenomenon which drives the rainfall regime in Western Africa. Therefore, this is the main phenomenon in water resources over the African continent from the equatorial zone to the subSaharian one. Obviously, it has a major impact on agricultural activities and thus on the population itself. The causes of interannual spatiotemporal variability of monsoon rainfall have not yet been univocally determined. Spatiotemporal changes on the see surface temperature (SST) within the Guinea Gulf and Saharian and SubSaharian Albedo are identified by a considerable body of evidences as major factors to explain it.
The aim of this study is to simulate the rainfall by a regional atmospheric model (RAM) and to analyze its sensitivity to the variability of these inputs parameters. Once precipitations from RAM are compared to several precipitation data sets we can observe that the RAM simulates the West African monsoon reasonably.
Statistical methodology
As mentioned in the previous paragraph, our main goal is to perform a sensitivity analysis for the West African monsoon. Each simulation of the regional atmospheric model (RAM) is time consuming, and we first have to think about a simplified model. We deal here with spatiotemporal dynamics, for which we have to develop functional efficient statistical tools. In our context indeed, both inputs (albedo, SST) and outputs (precipitations) are considered as time and space indexed stochastic processes. A first step consists in proposing a functional modeling for both precipitation and sea surface temperatures, based on a new filtering method. For each spatial grid point in the Gulf of Guinea and each year of observation, the sea surface temperature is measured during the active period on a temporal grid. A KarhunenLoève decomposition is then performed at each location on the spatial grid [91] . The estimation of the time dependent eigenvalues at different spatial locations generates great amounts of highdimensional data. Clustering algorithms become then crucial in reducing the dimensionality of such data.
Thanks to the functional clustering performed on the first principal component at each point, we have defined specific subregions in the Gulf of Guinea. On each subregion, we then choose a referent point for which we keep a prescribed number of principal components which define the basis functions. The sea surface temperature at any point in this subregion is modeled by the projection on this truncated basis. The spatial dependence is described by the coefficients of the projection. The same approach is used for precipitation. Hence for both precipitation and sea surface temperatures, we obtain a decomposition where the basis functions are functions depending on time and whose coefficients are spatially indexed and time independent. Then, the most straightforward way to model the dependence of precipitation on sea surface temperatures is through a multivariate response linear regression model with the output (precipitation) spatially indexed coefficients in the above decomposition and the input (SST) spatially indexed coefficients being predictors. A naive approach consists in regressing each response onto the predictors separately; however it is unlikely to produce satisfactory results, as such methods often lead to high variability and overfitting. Indeed the dimensions of both predictors and responses are large (compared to the sample size).
We apply a novel method recently developed by [83] in integrated genomic studies which takes into account both aspects. The method uses an ${\ell}_{1}$norm penalty to control the overall sparsity of the coefficient matrix of the multivariate linear regression model. In addition, it also imposes a group sparse penalty. This penalty puts a constraint on the ${\ell}_{2}$ norm of regression coefficients for each predictor, which thus controls the total number of predictors entering the model, and consequently facilitates the detection of important predictors. The dimensions of both predictors and responses are large (compared to the sample size). Thus in addition to assuming that only a subset of predictors enter the model, it is also reasonable to assume that a predictor may affect only some but not all responses. By the way we take into account the complex and spatiotemporal dynamics. This work has been published in [1] .
Distributed Interactive Engineering Toolbox
An important point in the study described above is that the numerical storage and processing of model inputs/outputs requires considerable computation resources. They were performed in a grid computing environment with a middleware (DIET) which takes into account the scheduling of a huge number of computation requests, the datamanagement and gives a transparent access to a distributed and heterogeneous platform on the regional Grid CIMENT (http://ciment.ujfgrenoble.fr/ ).
Thus, a different DIET module was improved through this application. An automatic support of a data grid software (http://www.irods.org ) through DIET and a new web interface designed for MAR was provided to physicians.
These works involve also partners from the INRIA project/team GRAAL for the computational approach, and from the Laboratory of Glaciology and Geophysical Environment (LGGE) for the use and interpretation of the regional atmospheric model (RAM).
Tracking for mesoscale convective systems
Participants : Anestis Antoniadis, Céline Helbert, Clémentine Prieur, Laurence Viry, Roukaya Keinj.
Scientific context
In this section, we are still concerned with the monsoon phenomenon in western Africa and more generally with the impact of climate change. What we propose in this study is to focus on the analysis of rainfall system monitoring provided by satellite remote sensing. The available data are microwave and IR satellite data. Such data allow characterizing the behaviour of the mesoscale convective systems. We wish to develop stochastic tracking models, allowing for simulating rainfall scenari with uncertainties assessment.
Stochastical approach
The chosen approach for tracking these convective systems and estimating the rainfall intensities is a stochastic one. The stochastic modeling approach is promising as it allows developping models for which confidence in the estimates and predictions can be evaluated. The stochastic model will be used for hydroclimatic applications in West Africa. The first part of the work will consist in implementing a model developed in [88] on a test set to evaluate its performances, our ability to infer the parameters, and the meaning of these parameters. Once the model well fitted on toy cases, this algorithm should be run on our data set, and compared with previous results by [80] or by [79] . The model developed by [88] is a continuous time stochastic model to multiple target tracking, which allows in addition to birth and death, splitting and merging of the targets. The location of a target is assumed to behave like a Gaussian Process when it is observable. Targets are allowed to go undetected. Then, a Markov Chain State Model decides when the births, death, splitting or merging of targets arise. The tracking estimate maximizes the conditional density of the unknown variables given the data. The problem of quantifying the confidence in the estimate is also addressed. Roukaya Keinj started working on this topic with a two years postdoctoral position in November 2011.
Sensitivity analysis for forecasting ocean models
Participants : Eric Blayo, Maëlle Nodet, Clémentine Prieur, Gaëlle Chastaing, Alexandre Janon, JeanYves Tissot.
Scientific context
Forecasting ocean systems require complex models, which sometimes need to be coupled, and which make use of data assimilation. The objective of this project is, for a given output of such a system, to identify the most influential parameters, and to evaluate the effect of uncertainty in input parameters on model output. Existing stochastic tools are not well suited for high dimension problems (in particular timedependent problems), while deterministic tools are fully applicable but only provide limited information. So the challenge is to gather expertise on one hand on numerical approximation and control of Partial Differential Equations, and on the other hand on stochastic methods for sensitivity analysis, in order to develop and design innovative stochastic solutions to study high dimension models and to propose new hybrid approaches combining the stochastic and deterministic methods.
Estimating sensitivity indices
A first task is to develop tools for estimated sensitivity indices. Among various tools a particular attention was paid to FAST and its derivatives. In [89] , the authors present a general way to correct a positive bias which occurs in all the estimators in random balance design method (RBD) and in its hybrid version, RBDFAST. Both these techniques derive from Fourier amplitude sensitivity test (FAST) and, as a consequence, are faced with most of its inherent issues. And up to now, one of these, the wellknown problem of interferences, has always been ignored in RBD. After presenting in which way interferences lead to a positive bias in the estimator of firstorder sensitivity indices in RBD, the authors explain how to overcome this issue. They then extend the bias correction method to the estimation of sensitivity indices of any order in RBDFAST. They also give an economical strategy to estimate all the firstorder and secondorder sensitivity indices using RBDFAST.
Intrusive sensitivity analysis, reduced models
Another point developed in the team for sensitivity analysis is model reduction. To be more precise regarding model reduction, the aim is to reduce the number of unknown variables (to be computed by the model), using a well chosen basis. Instead of discretizing the model over a huge grid (with millions of points), the state vector of the model is projected on the subspace spanned by this basis (of a far lesser dimension). The choice of the basis is of course crucial and implies the success or failure of the reduced model. Various model reduction methods offer various choices of basis functions. A wellknown method is called proper orthogonal decomposition" or principal component analysis". More recent and sophisticated methods also exist and may be studied, depending on the needs raised by the theoretical study. Model reduction is a natural way to overcome difficulties due to huge computational times due to discretizations on fine grids. In [61] , the authors present a reduced basis offline/online procedure for viscous Burgers initial boundary value problem, enabling efficient approximate computation of the solutions of this equation for parametrized viscosity and initial and boundary value data. This procedure comes with a fastevaluated rigorous error bound certifying the approximation procedure. The numerical experiments in the paper show significant computational savings, as well as efficiency of the error bound. The present preprint is under review. When a metamodel is used (for example reduced basis metamodel, but also kriging, regression, ...) for estimating sensitivity indices by Monte Carlo type estimation, a twofold error appears : a sampling error and a metamodel error. Deriving confidence intervals taking into account these two sources of uncertainties is of great interest. We obtained results particularly well fitted for reduced basis metamodels [61] . Alexandre Janon obtained a best poster award on the topic [40] . An ongoing work deals also with asymptotic confidence intervals in the double limit where the sample size goes to infinity and the metamodel converges to the true model. Implementations have to be conducted on more general models such as ShallowWater models.
Sensitivity analysis with dependent inputs
An important challenge for stochastic sensitivity analysis is to develop methodologies which work for dependent inputs. For the moment, there does not exist conclusive results in that direction. Our aim is to define an analogue of Hoeffding decomposition [75] in the case where input parameters are correlated. A PhD started in October 2010 on this topic (Gaëlle Chastaing). We obtained first results which should be submitted soon, deriving a general functional ANOVA for dependent inputs, allowing defining new variance based sensitivity indices for correlated inputs.
Quantification of uncertainty with Multifidelity computer experiments
Participants : Federico Zertuche, Céline Helbert, Anestis Antoniadis.
Propagation of uncertainties through computer codes is a hard task when dealing with heavy industrial simulators. Confidence intervals announced on predictions are often huge because of the lack of data. The context of the study here is the case of simulations when multiple levels of analysis (fast and slow) are available. In most cases the fast (but less trustworthy) and the slow (but more accurate) response values can be obtained independently. Thus, we can learn more about the response by additionally measuring the cheap function(s) on a large number of x 's. In most cases the relationship between cheap and expensive responses is modeled by an autoregressive Gaussian regression. This method is a natural extension of the kriging method in the sense that to build the surrogate one performs a Gaussian regression for the cheap data and one for the difference vector defined by the autoregressive relationship. The prediction error depends on the prediction error of the cheap and expensive surrogates. We can observe that this modeling greatly improves the traditional kriging method when the actual relationship between the cheap and expensive responses is somewhat linear. On another hand, this approach gives worse results when the relation between cheap and expensive is far to be linear. Therefore some improvements must be made on the models to take into account a more precise link between the two levels fidelity of the responses. Some other additional tasks concern the associated numerical designs (must the designs be absolutely nested ?) and the allocation of resources between low and fast runs. The work is currently the object of the thesis of Federico Zertuche that has just begun in October 2011.
Impact of the thermodynamics and chemical kinetics parameters at different scales for the models of CO2 storage in geological media
Participant : Céline Helbert.
In collaboration with Bernard Guy and Joharivola Raveloson (Ecole des Mines de SaintEtienne) we study the watergasrock interactions in the case of CO2 storage in geological environment. The focus is on the scale of observation of geochemical phenomena while taking into account the heterogeneity of the reservoir. This heterogeneity at small and large scale helps to maintain a local variability of the chemical composition of the fluid and influence reaction rates at the pore as well as at the reservoir scale. We propose to evaluate the geostatistical characteristics of local variability thanks to simulations of reactive transport on a small scale in which parameters (namely the equilibrium constants log K and the rate constant k) are perturbed to represent local processes. This contribution is the following of a precedent study of the impact of the reservoir uncertainties on the CO2 storage [72] .