Section: New Results
Non–parametric learning of the optimal importance distribution
Participant : François Le Gland.
This is a collaboration with Nadia Oudjane, from EDF R&D Clamart.
Evaluating an integral or a mathematical expectation of a nonnegative function can always be seen as computing the normalization constant in a Boltzmann–Gibbs probability distribution. When the probability distribution and the nonnegative function do not agree, i.e. have significant contributions in different parts of the integration space, then the variance of the estimator can be very large, and one should use another importance distribution, ideally the optimal (zero variance) importance distribution * which unfortunately cannot be used since it depends on the desired (but unknown) integral. Alternatively, sequential methods have been designed (under different names, such as annealed sampling, progressive correction, multilevel splitting, etc., depending on the context) which not only provide an expression for the desired integral as the product of intermediate normalization constants, but ultimately provide as well an N –sample approximately distributed according to the optimal importance distribution. From the weighted empirical probability distribution associated with this sample, a regularized probability distribution N can be obtained, using a kernel method or a simple histogram, and can be used as an almost optimal importance distribution to estimate the original integral with a M –sample distributed according to N . The variance of the resulting estimator depends on the product of the inverse sample size 1/M by the 2 –distance between the almost optimal importance distribution N and the optimal (zero variance) importance distribution * .
Our contribution has been to provide an estimate of this 2 –distance, under mild assumptions. The impact of dimension on density estimation is a limiting factor here, but the variance reduction is very significant in moderate dimensions.