Section: New Results
Keywords : Evolution Strategies, Estimation of Distribution, Convergence of evolutionary algorithms, Asymptotic convergence rate, Self-adaptivity.
Fundamentals of Evolutionary Computation
Abstract: Evolutionary Computation (EC) is a unifying framework for population-based optimization algorithms. It relies on a crude imitation of the Darwinian evolution paradigm: adapted species emerge thanks to natural selection combined with blind variations. Historical approaches differ by the search space they work on: genetic algorithms work on bit-strings, evolution strategies on real-valued parameters, and genetic programming on structured programs.
EC is now widely acknowledged as a powerful optimization framework dedicated to ill-posed optimization problems. The main reason for its efficiency comes from the possibility for EC to incorporate background knowledge about the application domain into the representation and the variation operators.
Convergence results for the (1, )-Self-Adaptive Evolution Strategies
Evolution Strategies are evolutionary algorithms recommended by the state of the art in practical parametric optimisation. Since their invention in the mid-sixties, theoretical studies mainly concentrate on establishing local properties of this algorithm on the well known sphere function ( ). A recent paper  investigates the global convergence of the (1, )-SA-ES on this function and proves sufficient conditions ensuring the linear convergence (or divergence) of the algorithm. The proofs call upon the Theory of Markov chains on a continuous state space and make use of the so-called drift conditions to establish practical properties of the Markov chains investigated, and are the continuation of Anne Auger's PhD thesis, defended in 2004.
New theoretical works have been performed, in continuing collaboration with Anne Auger, now at ETHZ:
A 3/2-order convergence proof for a Newton-Based surrogate model, as far as we know the best ever-published convergence rate for derivative-free methods  . This result is based on an ad hoc choice of the mutation step-size and a derandomized sampling ; it can indeed be seen as a result about the choice of the step-size in finite-difference Newton-methods. This work is therefore between evolution strategies and classical optimization.
A linear convergence proof for a derandomized evolution strategy, under very mild hypothesis  . This result uses an ad hoc sampling for a linear convergence rate in a very difficult framework where objective functions have no continuity, even in the neighborhood of the optima.
Methods from Statistical Learning Theory have been used in the framework of Genetic Programming for function identification, leading to original results in the theoretical study of GP. It has been published in  , and in CAP'05 conference, where it was suggested for publication in the journal RIA. This study (i) provides proofs of bloat in various standard frameworks for genetic programming (ii) provides proofs of no-bloat in other frameworks. This work has been recommended for publication in GPEM Journal and a joint work with C. Gagné (post-doctoral fellow in the Tao team) has been performed, including numerical experiments that show that the approach can be used in practice.
Another joint work where GP is used for Machine Learning, in the continuation of C. Gagné's PhD thesis, together with M. Schoenauer and in collaboration with M. Tomassini (U. Lausanne) and M. Parizeaux (U. Laval à Québec) studies the influence on the bloat of several heuristics for the choice of the best GP classifier after a multi-objective GP run  .
The population-based dynamics of EAs allows one to tackle multi-objective optimization (sampling the Pareto front, i.e. the set of the best compromises between the contradictory objectives) (K. Deb, Multi-Objective Optimization Using Evolutionary Algorithms , John Wiley, 2000.).
This part of TAO's activities buildson on Olga Rudenko'PhD, defended in 2004, where, apart from applications in the automobile area, she designed a sound stopping criterion (the only available criterion was ...a given number of generations) and proposed a new crossover operator based on the dominance property.
Those works have been completed by a theoretical study  , including the work of Y. Bonnemay, in training-course with St Gobain. This theoretical study shows, in the case of a simplified multi-objective evolutionary algorithm, that :
the sample complexity for a given precision is roughly linear in the number of objective functions ;
a good stopping/precision criterion is the ratio between the undominated points that have been already discovered and the total number of visited points.
Also note that C. Gagné's post-doctoral work on GP for Machine Learning is also concerned with evolutionary multi-objective optimzation.
Surrogate models in Evolution Strategies
The work about surrogate models, that was started in the Tao team by K. Abboud in his Ph.D. thesis, defended in 2004, has been continued by the work of Y. Bonnemay (in training between Saint-Gobain and Tao) and O. Teytaud. This work concern the theoretical properties of surrogate models, and an empirical study on (i) standard test difficult test functions (ii) industrial problems.
The approach by virtual examples, proposed to solve the inverse problem of chromatography (section 6.4.3 ) also makes use of surrogate models and builds on K. Abboud's PhD.
Estimation of Distribution Algorithms
Estimation of Distribution Algorithms (EDAs) proceed by alternatively sampling and updating a distribution on the search space. The sampled individuals are evaluated, i.e. their fitness is computed, and the distribution is updated and biased toward the best individuals in the current sample. Extensions of this framework to continuous optimization, initialized by Ducoulombier & Sebag (1998) (Extending Population-Based Incremental Learning to Continuous Search Spaces, PPSN'98), showed failures in specific cases where the solutions lie on the edge of the search space. Regularization heuristics, which calibrate the eigenvalues of this distribution, were shown to successfully overcome such failures.
On-going work (Ph.D. Celso Ishida, co-advised with A. Pozo, Universidad Federale do Parana, Brazil) is concerned with using mixtures of distributions, borrowing to the MIXMOD EM-like approaches developed in the SELECT project at INRIA, to extend EDAs to multi-modal optimization.
In the same way EDAs evolve a stochastic model mainly applied to the search in the space of continuous values, one can evolve a stochastic grammar model, with a grammar describing the building of structures, mainly applied to the search in the space of programs. This is called EDP (Evolution of Distribution Programming). In previous works (A. Rattle and M. Sebag. Avoiding the Bloat with Stochastic Grammar-based Genetic Programming. In P. Collet et al., eds, Evolution Artificielle , pp 254–266, LNCS 2310, Springer Verlag, 2002.), we already investigated a EDP model based on the coupling of context-free grammars with genetic programming (CFG-GP), with sound results. The Master stage of Jean-Philippe Braud aimed at further exploring this matter, by refining the program distribution model and endowing it the ability to represent probabilities for path-dependent derivations of the grammar model. This stage was supervised by Samuel Landau, during his postdoc at TAO (report not yet available).