## Section: New Results

Keywords : Evolution Strategies, Estimation of Distribution, Convergence of evolutionary algorithms, Asymptotic convergence rate, Self-adaptivity.

### Fundamentals of Evolutionary Computation

Participants : Anne Auger, Nicolas Bredeche, Alexandre Devert, Sylvain Gelly, Mohamed Jebalia, Marc Schoenauer, Michèle Sebag, Olivier Teytaud.

**Abstract**:
Evolutionary Computation (EC) is a unifying framework for
population-based optimization algorithms. It relies on a crude
imitation of the Darwinian evolution paradigm: adapted species emerge
because of the interplay between natural selection and blind variations.
Evolutionary algorithm design starts with the choice of a
representation (i.e. the choice of the search space to explore), of the
corresponding variation operators (crossover, mutation), the crafting
of the fitness function, and the tuning of the many hyper-parameters
(in particular, those related to the way ``natural'' selection is
performed).
In that respect, historical approaches mainly differ by the search
space they work on: genetic
algorithms work on bit-strings, evolution strategies on real-valued
parameters, and genetic programming on structured programs – even
though some significant differences also exist in the way selection is
applied.

EC is now widely acknowledged as a powerful optimization framework
dedicated to ill-posed optimization problems. The main reason for its
efficiency comes from its flexibility to
incorporate background knowledge about the application domain into the
representation and the variation operators, as well as algorithmic
procedures from other areas into its own variation and selection
routines. A quick introduction to the field, in French, can be found in the
``evolutionary'' chapter of the teaching material from Grégoire
Allaire's Ecole Polytechnique course on Structural Design *Conception optimale des Structures* , recently published
[12] . EC for Numerical Engineering is also
presented and discussed as a chapter in *Modélisation Numérique: défis et perspectives, Hermès 2006* [10] .

#### Convergence Analysis for Evolution Strategies

Almost all stochastic algorithms in Computer Science are implemented
using pseudo-random sequences to simulate random
distributions. However, quasi-random sequences sometimes offer better
properties in term of uniformity criteria.
[22] addresses the case of
quasi-random mutations in Evolution Strategies, and shows that
all quantiles of standard estimates of the off-line result of the
algorithm (i.e. both lucky and un-lucky runs) are improved
by derandomization. Various quasi-random mutations are proposed, and
some of them can be easily applied to many variants of Evolution
Strategies (e.g. (1, )-ES, ( / , )-ES,
( , )-ES with large enough)
with significant improvements in dimensionality 1-50. In particular, this generality
(*all* quantiles are improved) shows
that no robustness argument applies against derandomization.

Another paper[39] , generalizes previously known lower bounds to any comparison-based algorithm (including e.g. direct search methods, and not only evolutionary methods). The theorem, using entropy numbers of the domain, has very weak assumptions and matches existing upper bounds. The obtained lower bound holds for any

We also studied [23] the idea of defining rigorously frameworks in which the optimal algorithm exists. First, this paper shows the optimality of comparison-based algorithms in the robust-framework (worst case on increasing transformations of the objective function). This result is the counterpart of the previous paper: comparison-based algorithms are slower, but optimal for some robustness-criterion. Also with the idea of using optimality in optimization, the paper shows that under some prior distribution of fitness and with a maximum (a priori known) number of objective-function-evaluations, one can define the notion of "optimal optimization algorithms", and implement such algorithms. The "optimal algorithm" is computationally very expensive, but approximations are proposed and are relevant for the framework of EDA (estimation of distribution algorithms).

Similar ideas have been applied to study the complexity of approximating Pareto-sets with large number of conflicting objectives in the context of Evolutionary Multi-Objective Optimization. Strong lower bounds have been derived, mainly leading to the conclusion that multi-objective problems with many conflicting objectives can not be fully solved off-line [40] . This is in accordance with practice, as practitioners admit that they can not deal in a fully off-line manner with many conflicting objectives, and suggests to move to on-line interactive methods when the number of conflicting objectives is large. Note that the paper also provides theoretical foundations for the use of methods based on the removal of moderately conflicting objectives.

#### Genetic Programming

Genetic Programming (GP) is a technique to evolve programs, represented as parse-trees. GP can directly be used for supervised learning, in which case the output of the program (the tree) is the class a given example belongs to. In this framework, in the continuation of C. Gagné's PhD thesis and in collaboration with M. Tomassini (U. Lausanne) and M. Parizeaux (U. Laval à Québec), we studied the influence of several heuristics for the choice of the best GP classifier after a multi-objective GP run on the bloat (the uncontrolled growth of the sizes of the trees) [19] .

But GP can also be used for Machine Learning in a more indirect way: the choice of the kernel is known to be a major issue when using a kernel-based learner. In [20] , GP is used to build the optimal kernel for a given learning task; Furthermore, the complexity of the algorithm is reduced by co-evolving the subset of samples and the kernels, to avoid learning on all examples at once.

Another use of GP is proposed in Alexandre Devert's PhD work, namely an embryogenic approach to the Optimum Design problem. This work is described in more details in section 6.3.3 .

On the theoretical side, the work using hints from learning theory for symbolic regression and presented at CAp 2005 has been continued: This work uses methods from statistical learning theory to prove that bloat cannot be avoided in various standard frameworks and to propose new method to fight bloat that are proved to work, and experimentally validated by some extensive experiments. Those extended results have been presented at Dagstuhl seminar[44] , and published in a journal version (in French) [6] , while an English version has been submitted to an international journal.

Related works [41] include some complexity and computability results showing in some minimax sense the not-too-far-from-optimality nature of simulation-based and selection-based methods like genetic programming for mining spaces of Turing-Computable functions.

#### Surrogate models in Evolution Strategies

The work about surrogate models, that was started in the Tao team by
K. Abboud in his Ph.D. thesis, defended in 2004, has been continued by the
work of Y. Bonnemay (co-supervised between Saint-Gobain and Tao) and
O. Teytaud. This work concern the theoretical properties of surrogate
models, and an empirical study on (i) standard test difficult test
functions (ii) industrial problems. Also, Tao is leading the workpackage *General meta-models* within
the RNTL project OMD (*Optimization Multi-Disciplinaire* ) (see
section
6.4.5 ).

We compared surrogate models and Estimation of Distribution Algorithms in [24] in the particular framework of very small numbers of iterates (expensive fitness-functions); more information about this is provided in section 6.2.4 . We also developed a mathematical analysis of very different ways of including learning in optimization; a draft of this work, based on Bellman's decomposition, is available at http://www.lri.fr/~teytaud/optim.pdf .

#### Estimation of Distribution Algorithms

Estimation of Distribution Algorithms (EDAs) proceed by alternatively sampling and updating a distribution on the search space. The sampled individuals are evaluated, i.e. their fitness is computed, and the distribution is updated and biased toward the best individuals in the current sample. Extensions of this framework to continuous optimization was initialized by Ducoulombier & Sebag (1998)(Extending Population-Based Incremental Learning to Continuous Search Spaces, in Th. Bäck et al., Eds, PPSN'98, LNCS 1498, pp 418–427, Springer-Verlag, 1998).

On-going work (Ph.D. Celso Ishida, co-advised with A. Pozo, Universidad Federale do Parana, Brazil) is concerned with using mixtures of distributions, borrowing to the MIXMOD EM-like approaches developed in the SELECT project at INRIA, to extend EDAs to multi-modal optimization.

On the theoretical side, we studied a particular class of EDA [24] in the particular hot framework of expensive optimization functions (which is included in the OMD RNTL presented in section 6.4.5 , as well as in the RedOpt working group – http://norma.mas.ecp.fr/wikimas/RedOpt ). This paper provides conservative upper bounds that provide hints about parametrization of EDA, in particular depending on the available ressources (number of function-evaluations). In spite of the fact that the analysis is based on very conservative tools from VC-theory, the resulting algorithm is efficient for very frugal frameworks in which the number of function-evaluations is very moderate. The theoretical analysis emphasizes the importance of the dependency of the algorithm on the population size, that should be chosen much larger when the number of iterates is larger, and should also be much larger than usually done when robustness is a main goal.