Section: New Results
Optimal Decision Making
This special interest group is devoted to all aspects of artifical intelligence related to sequential data; in particular, sequential decision with uncertainty and sequential learning. Several highly visible successes in computer-Go have provided both technical publications and popularization (section 6.5.1 ). Other applications far from Go have been realized and should be published soon (section 6.5.2 ).
Sequential decision under uncertainty applied to Computer-Go
The game of Go is a more than 2000 years old Asian game which is very important in China, Korea, Taiwan, and Japan. It is in particular interesting as it is much more difficult for computers than chess (in which humans can't win against computers any more, unless a handicap is given to the machine). As a consequence, the efficiency of new algorithms for this game is highly interesting  ,  ,  . We successfully parallelized Monte-Carlo Tree Search  , with both message-passing parallelization and shared-memory parallelization. The essential idea of this parallelization is to share the upper part of the tree, with messages compacting statistics, instead of sending positions and results on the network; the result is a very good speed-up in 19x19 Go, without shared-memory; the parallelization is less efficient in 9x9 Go but provides nonetheless a significant improvement. These results are applicable far from the game of Go, and were developped as a collaboration with Thomas Herault (PARAL team, Lri) and Sara/Univ. Maastricht (Nederlands)  .
This has a strong impact in 19x19 Go with the first ever win against a professional player (8th Dan Pro) with 9 handicap stones. A grid-based building of opening book has been launched, with an original strategy based on Monte-Carlo Tree Search algorithms. A survey paper was published  . In  , several forms of active learning are combined, including
expert rules, coming from the ages;
offline learning, based on last century's high-level games;
online learning, i.e. classical online values as in UCT, but without exploration term;
transient learning, i.e. online learnt value function (including extrapolation).
The two first rules are essential for the first visit in a node; the third one is essential asymptotically in the number of trials; the fourth one is essential as a transition between the offline part and the online part. This multi-level learning is the first source code combining in Go all these levels.
The communications of the ACM, vol. 51, Nb. 10, (10/08), page 13, published the first ever won against a professional player in 19x19, as well as several newspapers. MoGo was also interfaced with an interactive table (joint work with Microsoft  ). The video presentation of the table can be watched at http://www.youtube.com/watch?v=OQvVk1RLziY&feature=PlayList&p=90C0DB7A3DB9B52C&index=21 .
Other applications of UCT and bandits
A joint work P. Rolet/M. Sebag/O. Teytaud has been devoted to the design of an optimal algorithm for active learning. Based on a prior (similar to Bayesian priors), this algorithm has asymptotically (in the computational power) an optimal complexity w.r.t. the number of requests to the oracle labelling examples. An implementation has been realized, showing the practicability of the approach. A joint work with A. Auger was applied to the feasability of optimal optimization  thanks to UCT algorithms.
Interestingly, both the application to active learning and the application to optimal non-linear optimization combine:
"Billiard" algorithms that are used in both cases for generating conditional distributions;
Partially observable Markov Decision processes (POMDP) - the essential idea is the rewriting of non-linear optimization or active learning as POMDPs;
UCT, which is highly competitive in this one-actor framework.