Section: New Results
Non–parametric learning of the optimal importance distribution
Participant : François Le Gland.
This is a collaboration with Nadia Oudjane, from EDF R&D Clamart.
Evaluating an integral or a mathematical expectation of a nonnegative
function can always be seen as computing the normalization constant in
a Boltzmann–Gibbs probability distribution. When the probability distribution
and the nonnegative function do not agree, i.e. have significant contributions
in different parts of the integration space, then the variance of the estimator
can be very large, and one should use another
importance distribution, ideally the optimal (zero variance) importance
distribution * which unfortunately cannot be used since it depends
on the desired (but unknown) integral.
Alternatively, sequential methods have been
designed (under different names, such as annealed sampling, progressive
correction, multilevel splitting, etc., depending on the context) which
not only provide an expression for the desired integral as the product of
intermediate
normalization constants, but ultimately provide as well an N –sample
approximately distributed according to the optimal importance distribution.
From the weighted empirical probability distribution associated with this
sample, a regularized probability distribution
N can be obtained,
using a kernel method or a simple histogram, and can be used as an almost
optimal importance distribution to estimate the original integral
with a M –sample distributed according to
N .
The variance of the resulting estimator depends on the product of the
inverse sample size 1/M by the
2 –distance between the almost
optimal importance distribution
N and the optimal (zero variance)
importance distribution
* .
Our contribution has been to provide an estimate of this 2 –distance,
under mild assumptions. The impact of dimension on density
estimation is a limiting factor here, but the variance reduction is very
significant in moderate dimensions.