Abstract: OpenDP is an open source code for stochastic dynamic programming (Sylvain Gelly and Olivier Teytaud. Opendp: a free reinforcement learning toolbox for discrete time control problems. In NIPS Workshop on Machine Learning Open Source Software , 2006.), based upon the use of (i) time-decomposition as in standard dynamic programming (ii) learning (iii) derivative-free optimization. Its modular design was meant to easily integrate existing source codes: OpenBeagle (with the help of Christian Gagné), EO (with the help of Damien Tessier), CoinDFO, Opt++, and many others, for optimization; the Torch library and the Weka library and some others for learning. It also includes various derandomized algorithms (for robust optimization and sampling); other algorithms (e.g. time-pca and robotic-mapping) are underway. OpenDP has been experimented on a large set of benchmark problems (included in the environment), allowing for an extensive comparison of function-values approximators and derivative-free optimization algorithms with a tiny number of iterates.
The merit of the OpenDP platform is twofold. On one hand, while many of the above algorithms are well-known, their use in a dynamic programming framework is new. On the other hand, such a systematic comparison of these algorithms on general benchmarks did not exist in the literature of stochastic dynamic programming, where many papers only consider one learning method, not necessarily in the same conditions than other published results. These thorough experimentations inspired some theoretical work in progress about the criteria for learning in dynamic environments, noting that cross-validation is neither satisfactory (for example the 2 parameter in Gaussian SVM chosen by cross-validation is usually too small in the context of dynamic programming) nor fast enough in that framework.
See main page at http://opendp.sourceforge.net .