Team tao

Overall Objectives
Scientific Foundations
Application Domains
New Results
Contracts and Grants with Industry
Other Grants and Activities

Section: New Results

Optimal Decision Making

Participants : Olivier Teytaud, Philippe Rolet, Michèle Sebag, Romaric Gaudel, Cyril Furtlehner, Jean-Baptiste Hoock, Fabien Teytaud, Arpad Rimmel, Julien Perez.

[21] : paper in Computational Intelligence Magazine around the various tools for introducing learning into Monte-Carlo Tree Search - the paper also recalls that humans are by far stronger than computers, in particular for some well known families of situations.

[20] : survey and a few new results around computer-Go in IEEE Transactions on Computational Intelligence and Artificial Intelligence in Games.

[90] : complexity bounds on parallel (batch) active learning. Whereas traditional active learning analysis focuses on one example generated per iteration, we here consider $ \lambda$ examples generated simultaneously.

[16] : fundamental work on the convergence rates of evolutionary algorithms. Very general results, extending many published results. This work is based on the use of the branching factor techniques.

[77] : application of bandits to genetic programming. The resulting algorithm is provably, within a given (user-defined) confidence, consistent from a regression testing point of view.

[86] : including learning in the Monte-Carlo part of Monte-Carlo Tree Search. Whereas many works focus on learning the patterns for biasing the tree search, we learn the so-called “playout” part.

[88] : theoretical analysis of noisy optimization. Mainly theoretical, but the bandit part can be used for real algorithms: we propose essentially (i) a derandomization of the mutations (necessary for avoiding some bad cases) (ii) a selection step in which not all individuals are ranked (iii) the use of a Bernstein race for ranking individuals. The results are mathematically proved.

[61] collaboration with Univ. Lille and the Inria Sequel team around noisy optimization; extends the previous work and discusses the surprising fact that the best convergence rates are reached by algorithm which sample far from the optimum.

[47] : simple and effective application of bandit algorithms with simple regret for tuning Monte-Carlo Tree Search methods. In spite of its simplicity the approach, based on Bernstein races, is efficient and improves a highly optimized algorithm.

[69] : simple and effective application of matrix games for building stochastic opening books. The technique combines several deterministic opening books into one stochastic opening book with optimal (worst case) performance against these deterministic opening books.

[87] : use of Rapid-Action Value Estimates in the Monte-Carlo part of Monte-Carlo Tree Search methods. Whereas Rapid-Action Value Estimates are a widely known revolution for the tree part, we here apply it the playout part.

[98] : use of decisive and anti-decisive moves in Monte-Carlo Tree Search. We prove complexity upper bounds on the cost of this modification and get good results on the Havannah game, a classical challenge in games.

[46] : analysis of the “plateau” in the scalability of Monte-Carlo Tree Search methods, including some examples of simple things that the human brain performs easily and that computers don't solve. Includes the comparison of the various parallelizations of Monte-Carlo Tree Search.

[70] : feature selection is set as a Reinformcement Learning problem, and UCT is modified to tackle unknown horizon and huge branching factor.


Logo Inria