## Section: Scientific Foundations

### Scientific Foundations

**Abstract**:

One of the goals of Machine Learning and Data Mining is to extract optimal hypotheses from (massive amounts of) data. What "optimal" means varies with the problem. The goal might be to induce useful knowledge, allowing new cases to be classified with optimal confidence (predictive data mining), or to synthesize the data into a set of understandable statements (descriptive data mining).

On the other hand, Evolutionary Computation and stochastic optimization are adapted to ill-posed optimization problems, such as involved in machine learning, data mining, identification, optimal policies, and inverse problems. However, optimization algorithms must adapt themselves to the search landscape; in other words, they need learning capabilities.

#### Machine learning, Data Mining, Inductive Logic Programming

Learning is concerned with i) choosing the form of knowledge to be extracted (rules, Horn clauses, distributions, patterns, equations, ..), referred to as hypothesis space or language ; ii) exploring this (HUGE) search space, to find the best hypotheses in it.

Learning thus is an optimization problem; however, the "real" optimization criterion is unknown. Learning is like a game, with incomplete information: i) in the statistical learning case, the player (learning algorithm) only knows some cards of the game (the available examples, in the data set); ii) in the data mining case, the player (algorithm) does not know the preferences of the expert (whom the algorithm tries to please).

New learning criteria (and the corresponding algorithms) are investigated, concerned with the Area under the ROC curve (Receiver Operating Characteristics), particularly for medical applications, and concerned with stable spatio-temporal data, with applications in Neurosciences.

#### Evolutionary Computation, Stochastic Optimisation

Considering the lack of a universal optimisation algorithm, the power of an optimisation algorithm is measured by its ability in acquiring and exploiting problem-specific information. The use of such a priori knowledge has long been heuristic. It leads for example to the development of operators specific to pattern optimisation, to constrained identification, etc. One of our objectives is to have operators able to adapt themselves by automatically exploiting regularities in the search space. Another objective is to investigate how domain knowledge can be introduced at all levels of evolutionary algorithms, starting with the representation itself, and in the corresponding variation operators.

#### Robotics

Autonomous robotics is a fascinating challenge to optimisation and machine learning. The TAO approach is mainly inspired by evolutionary robotics and cognitive science. It is based on defining a control problem (optimisation of behavioral traits leading to the desired behavior), on subordinating the controller to the real world (by a module which predicts the effects of its actions, by simplifying sensory-motor cognition, etc) and on intensive use of the informations acquired by the robot (log mining).

A new framework has arrived in the team with the OpenDP project, that combines methods from operational research (discretization of the Hamilton-Jacobi-Bellman equation), and learning of the Bellman-function. Robotics is not the usual application of such methods ; thanks to modern methods of learning, working in spaces of large dimension is possible, enabling the application of stochastic dynamic programming for robotics.

#### Crosswise Axis : Phase Transition

Coming from work done on constraint satisfactions, the phase transition concept is appropriate to describe the average performance of algorithms faced with NP-complete problems. This concept allows us to study how machine learning and relational data mining scale up, and to define the space of difficult problems for functional algorithms in this domain. The study of this perspective for propositional learning is in progress. The point of such an empirical and statistical study of algorithms taken as black boxes is first to pinpoint their weaknesses, then to understand these weaknesses, and then, hopefully, to fix them.