## Section: New Results

### New methods for data assimilation

Since the beginning, the CLIME project has also been focussing on new techniques for data assimilation. Since air quality is prone to non-Gaussian statistics, an expertise has first been on rigorous non-Gaussian approaches, often based on information-theoretical tools (maximum entropy on the mean, relative entropy, second order analysis, etc.). Another expertise is now being developed in multiscale data assimilation, and the mathematical tools required to deal with many space and time scales within data assimilation schemes. It has been made concrete with the launch of the ANR project MSDAG (Multiscale Data Assimilation for Geophysics) in January 2009.

#### Towards optimal choices of control space representation for geophysical data assimilation

Participant : Marc Bocquet.

In geophysical data assimilation, observations shed light on a control parameter space through a model, a statistical prior, and an optimal combination of these sources of information. This control can be a set of discrete parameters, or, more often in geophysics, part of the state vector, which is distributed in space and time. When the control space is continuous, it must be discretised for numerical modeling. This discretisation, called a representation of the distributed parameter space in the framework of this work, is always fixed a priori. The representation of the control space should however be considered a degree of freedom on its own. The goal of this work is to demonstrate that one could optimise it to perform data assimilation in optimal conditions. The optimal representation is then chosen over a large dictionary of adaptive grid representations involving several space and time scales.

First, the importance of the representation choice has been studied through the impact of a change of representation on the posterior analysis of data assimilation and its connection to the reduction of uncertainty. The study stresses that in some circumstances (atmospheric chemistry, in particular) the choice of a proper representation of the control space is essential to set the data assimilation statistical framework properly. A possible mathematical framework has been proposed for multiscale data assimilation. To keep the developments simple, a measure of the reduction of uncertainty is chosen as a very simple optimality criterion. Using this criterion, a cost function is built to select the optimal representation. It is a function of the control space representation itself. A regularisation of this cost function, based on a statistical mechanical analogy, guarantees the existence of a solution. This allows numerical optimisation to be performed on the representation of control space. The formalism has then been successfully applied to the inverse modeling of an accidental release of an atmospheric contaminant at European scale, using real data (see Figure 5 ).

This is a first contribution from CLIME to the ANR SYSCOMM MSDAG project.

#### Modeling non-Gaussianity of background and observational errors by the maximum entropy method

Participants : Carlos A. Pires [ Instituto Dom Luis, University of Lisbon, Portugal ] , Olivier Talagrand [ Laboratoire de Météorologie Dynamique ] , Marc Bocquet.

The Best Linear Unbiased Estimator (BLUE) has widely been used in atmospheric and oceanic data assimilation. However, when the errors from data (observations and background forecasts) have non-Gaussian probability density functions (pdfs), the BLUE differs from the absolute Minimum Variance Unbiased Estimator (MVUE), minimising the mean square a posteriori error. The non-Gaussianity of errors can be due to the inherent statistical skewness and positiveness of some physical observables (e.g., moisture, chemical species) or because of the nonlinearity of the data assimilation models and observation operators acting on Gaussian errors. Non-Gaussianity of assimilated data errors can be justified from a priori hypotheses or inferred from statistical diagnostics of innovations (observation minus background). Following this rationale, we compute measures of innovation non-Gaussianity, namely its skewness and kurtosis, relating it to: a) the non-Gaussianity of the individual errors themselves, b) the correlation between nonlinear functions of errors, and c) the heteroscedasticity of errors within diagnostic samples. Those relationships impose bounds for skewness and kurtosis of errors which are critically dependent on the error variances, thus leading to a necessary tuning of error variances in order to accomplish consistency with innovations. We evaluate the sub-optimality of the BLUE as compared to the MVUE, in terms of excess of error variance, under the presence of non-Gaussian errors. The error pdfs are obtained by the maximum entropy method constrained by error moments up to fourth order, from which the Bayesian probability density function and the MVUE are computed. The impact is higher for skewed extreme innovations and grows in average with the skewness of data errors, especially if those skewnesses have the same sign. Application has been performed to the quality-accepted ECMWF innovations of brightness temperatures of a set of High Resolution Infrared Sounder (HIRS) channels. In this context, the MVUE has led in some extreme cases to a potential reduction of 20-60% error variance as compared to the BLUE.