Participant : Olivier Teytaud [ correspondent ] .
Abstract: OpenDP is an open source code for stochastic dynamic programming(Sylvain Gelly and Olivier Teytaud. Opendp: a free reinforcement learning toolbox for discrete time control problems. In NIPS Workshop on Machine Learning Open Source Software , 2006.) combining time-decomposition (as in standard dynamic programming), learning, and derivative-free optimization. Its modular design was meant to easily integrate existing source codes: OpenBeagle (with the help of Christian Gagné), EO (with the help of Damien Tessier), CoinDFO, Opt++, and many others, for optimization; the Torch library and the Weka library and some others for learning. It also includes various derandomized algorithms (for robust optimization and sampling) as well as time-pca and robotic-mapping. OpenDP has been experimented on a large set of benchmark problems (available in the environment), allowing for an extensive comparison of function-values approximators and derivative-free optimization algorithms with a small number of iterations.
The merit of the OpenDP platform is twofold. On the one hand, the use of the above well-known algorithms is new in the DP framework. On the other hand, the litterature did not provide nor allow a principled and systematic comparison of algorithms on a comprehensive benchmark suite. Our thorough experimentations inspired further theoretical work about the learning criteria in dynamic environments, motivated by the shortcomings of cross-validation in this framework (e.g. the 2 parameter in Gaussian SVM chosen by cross-validation is usually too small in the DP context).