Team tao

Overall Objectives
Scientific Foundations
Application Domains
New Results
Contracts and Grants with Industry
Other Grants and Activities

Section: New Results

Keywords : Sequential data, stochastic dynamic optimization, active learning, Monte-Carlo Tree Search, Upper Confidence Tree.

Optimal Decision Making

Participants : Olivier Teytaud, Philippe Rolet, Lou Fedon, Romaric Gaudel, Cyril Furtlehner, Jean-Baptiste Hoock, Fabien Teytaud, Arpad Rimmel, Julien Prez.

This special interest group is devoted to all aspects of artifical intelligence related to sequential data; in particular, sequential decision with uncertainty and sequential learning. Several highly visible successes in computer-Go have provided both technical publications and popularization (section 6.5.1 ). Other applications far from Go have been realized and should be published soon (section 6.5.2 ).

Sequential decision under uncertainty applied to Computer-Go

The game of Go is a more than 2000 years old Asian game which is very important in China, Korea, Taiwan, and Japan. It is in particular interesting as it is much more difficult for computers than chess (in which humans can't win against computers any more, unless a handicap is given to the machine). As a consequence, the efficiency of new algorithms for this game is highly interesting [42] , [43] , [41] . We successfully parallelized Monte-Carlo Tree Search [18] , with both message-passing parallelization and shared-memory parallelization. The essential idea of this parallelization is to share the upper part of the tree, with messages compacting statistics, instead of sending positions and results on the network; the result is a very good speed-up in 19x19 Go, without shared-memory; the parallelization is less efficient in 9x9 Go but provides nonetheless a significant improvement. These results are applicable far from the game of Go, and were developped as a collaboration with Thomas Herault (PARAL team, Lri) and Sara/Univ. Maastricht (Nederlands) [38] .

This has a strong impact in 19x19 Go with the first ever win against a professional player (8th Dan Pro) with 9 handicap stones. A grid-based building of opening book has been launched, with an original strategy based on Monte-Carlo Tree Search algorithms. A survey paper was published [17] . In [32] , several forms of active learning are combined, including

The two first rules are essential for the first visit in a node; the third one is essential asymptotically in the number of trials; the fourth one is essential as a transition between the offline part and the online part. This multi-level learning is the first source code combining in Go all these levels.

Figure 3. Left: The Huygens computer. Middle: games in Taiwan. Right: Interactive table.

The communications of the ACM, vol. 51, Nb. 10, (10/08), page 13, published the first ever won against a professional player in 19x19, as well as several newspapers. MoGo was also interfaced with an interactive table (joint work with Microsoft [37] ). The video presentation of the table can be watched at .

Other applications of UCT and bandits

A joint work P. Rolet/M. Sebag/O. Teytaud has been devoted to the design of an optimal algorithm for active learning. Based on a prior (similar to Bayesian priors), this algorithm has asymptotically (in the computational power) an optimal complexity w.r.t. the number of requests to the oracle labelling examples. An implementation has been realized, showing the practicability of the approach. A joint work with A. Auger was applied to the feasability of optimal optimization [2] thanks to UCT algorithms.

Interestingly, both the application to active learning and the application to optimal non-linear optimization combine:


Logo Inria