Overall Objectives
Research Program
Application Domains
Software and Platforms
New Results
Bilateral Contracts and Grants with Industry
Partnerships and Cooperations
XML PDF e-pub
PDF e-Pub


Publications of the year

Doctoral Dissertations and Habilitation Theses

A. Khaleghi.
Sur quelques problèmes non-supervisés impliquant des séries temporelles hautement dèpendantes, Institut national de recherche en informatique et en automatique (Inria), November 2013.

Articles in International Peer-Reviewed Journals

M. G. Azar, R. Munos, H. Kappen.
Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model, in: Machine Learning, 2013, vol. 91, no 3, pp. 325-349.
O. Cappé, A. Garivier, O.-A. Maillard, R. Munos, G. Stoltz.
Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation, in: Annals of Statistics, 2013, vol. 41, no 3, pp. 1516-1541, Accepted.
J. Fruitet, A. Carpentier, R. Munos, M. Clerc.
Automatic motor task selection via a bandit algorithm for a brain-controlled button, in: Journal of Neural Engineering, January 2013, vol. 10, no 1. [ DOI : 10.1088/1741-2560/10/1/016012 ]
M. Hauskrecht, I. Batal, M. Valko, S. Visweswaran, G. F. Cooper, G. Clermont.
Outlier detection for patient monitoring and alerting, in: Journal of Biomedical Informatics, February 2013, vol. 46, pp. 47-55. [ DOI : 10.1016/j.jbi.2012.08.004 ]
D. Ryabko, J. Mary.
A Binary-Classification-Based Metric between Time-Series Distributions and Its Use in Statistical and Learning Problems, in: Journal of Machine Learning Research, 2013, vol. 14, pp. 2837-2856.
B. Ryabko, D. Ryabko.
A confidence-set approach to signal denoising, in: Statistical Methodology, 2013, vol. 15, pp. 115–120.

International Conferences with Proceedings

B. Avila Pires, M. Ghavamzadeh, C. Szepesvari.
Cost-sensitive Multiclass Classification Risk Bounds, in: International Conference on Machine Learning, Atlanta, United States, 2013.
A. Carpentier, R. Munos.
Toward optimal stratification for stratified monte-carlo integration, in: International Conference on Machine Learning, United States, 2013.
P. Chainais, C. Richard.
Learning a common dictionary over a sensor network, in: CAMSAP 2013, Saint-Martin, France, December 2013, pp. 1-4.
R. Fonteneau, L. Busoniu, R. Munos.
Optimistic planning for belief-augmented Markov decision processes, in: IEEE International Symposium on Adaptive Dynamic Programming and reinforcement Learning, ADPRL 2013, Singapore, April 2013.
V. Gabillon, M. Ghavamzadeh, B. Scherrer.
Approximate Dynamic Programming Finally Performs Well in the Game of Tetris, in: Neural Information Processing Systems (NIPS) 2013, South Lake Tahoe, United States, 2013.
M. Gheshlaghi Azar, A. Lazaric, B. Emma.
Regret Bounds for Reinforcement Learning with Policy Advice, in: ECML/PKDD - European conference on machine learning and principles and practice of knowledge discovery in databases - 2013, Prague, Czech Republic, September 2013.
M. Gheshlaghi Azar, A. Lazaric, B. Emma.
Sequential Transfer in Multi-armed Bandit with Finite Set of Models, in: NIPS - Advances in Neural Information Processing Systems 25 - 2013, Lake Tahoe, United States, December 2013.
H. Kadri, M. Ghavamzadeh, P. Preux.
A Generalized Kernel Approach to Structured Output Learning, in: International Conference on Machine Learning (ICML), Atlanta, United States, 2013.
G. Kedenburg, R. Fonteneau, R. Munos.
Aggregating optimistic planning trees for solving markov decision processes, in: Advances in Neural Information Processing Systems, United States, 2013, pp. 2382-2390.
A. Khaleghi, D. Ryabko.
Nonparametric multiple change point estimation in highly dependent time series, in: Proc. 24th International Conf. on Algorithmic Learning Theory (ALT'13), Singapore, Springer, 2013, pp. 382-396.
N. Korda, E. Kaufmann, R. Munos.
Thompson sampling for one-dimensional exponential family bandits, in: Advances in Neural Information Processing Systems, United States, 2013.
B. Kveton, M. Valko.
Learning from a Single Labeled Face and a Stream of Unlabeled Data, in: 10th IEEE International Conference on Automatic Face and Gesture Recognition, Shanghai, China, January 2013.
O.-A. Maillard, P. Nguyen, R. Ortner, D. Ryabko.
Optimal Regret Bounds for Selecting the State Representation in Reinforcement Learning, in: ICML - 30th International Conference on Machine Learning, Atlanta, USA, United States, 2013, vol. 28(1), pp. 543-551.
P. Nguyen, O.-A. Maillard, D. Ryabko, R. Ortner.
Competing with an Infinite Set of Models in Reinforcement Learning, in: AISTATS, Arizona, United States, JMLR W&CP, 2013, vol. 31, pp. 463-471.
D. Ryabko.
Time-series information and learning, in: ISIT - International Symposium on Information Theory, Istanbul, Turkey, 2013, pp. 1392-1395.
D. Ryabko.
Unsupervised model-free representation learning, in: Proc. 24th International Conf. on Algorithmic Learning Theory (ALT'13), Singapore, Springer, 2013, pp. 354-366.
B. Szorenyi, R. Busa-Fekete, I. Hegedüs, R. Ormandi, M. Jelasity, B. Kégl.
Gossip-based distributed stochastic bandit algorithms, in: 30th International Conference on Machine Learning (ICML 2013), Atlanta, United States, S. Dasgupta, D. McAllester (editors), 2013, vol. 28, pp. 19-27.
E. M. Thomas, M. Clerc, A. Carpentier, E. Daucé, D. Devlaminck, R. Munos.
Optimizing P300-speller sequences by RIP-ping groups apart, in: IEEE/EMBS 6th international conference on neural engineering (2013), San Diego, United States, IEEE/EMBS, November 2013.
M. Valko, A. Carpentier, R. Munos.
Stochastic Simultaneous Optimistic Optimization, in: 30th International Conference on Machine Learning, Atlanta, United States, February 2013.
M. Valko, N. Korda, R. Munos, I. Flaounas, N. Cristianini.
Finite-Time Analysis of Kernelised Contextual Bandits, in: The 29th Conference on Uncertainty in Artificial Intelligence, Bellevue, United States, 2013.

National Conferences with Proceedings

P. Bas, P. Chainais, E. Zidel - Cauffet.
Quantification adaptative pour la stéganalyse d'images texturées, in: GRETSI 2013, Brest, France, September 2013.
P. Chainais, C. Richard.
Distributed dictionary learning over a sensor network, in: CaP 2013, Villeneuve d'Ascq, France, July 2013, pp. 1-4.

Scientific Books (or Scientific Book chapters)

L. Busoniu, R. Munos, R. Babuska.
A review of optimistic planning in Markov decision processes, in: Reinforcement Learning and Adaptive Dynamic Programming for Feedback Control, F. Lewis, D. Liu (editors), IEEE Press Series on Computational Intelligence, Wiley-IEEE Press, January 2013, chap. 22, pp. 494-516.

Internal Reports

M. Ghavamzadeh, Y. Engel.
Bayesian Policy Gradient and Actor-Critic Algorithms, January 2013.
P. L.A., M. Ghavamzadeh.
Actor-Critic Algorithms for Risk-Sensitive MDPs, February 2013.
R. Munos.
From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning, 2013.
References in notes
P. Auer, N. Cesa-Bianchi, P. Fischer.
Finite-time analysis of the multi-armed bandit problem, in: Machine Learning, 2002, vol. 47, no 2/3, pp. 235–256.
R. Bellman.
Dynamic Programming, Princeton University Press, 1957.
D. Bertsekas, S. Shreve.
Stochastic Optimal Control (The Discrete Time Case), Academic Press, New York, 1978.
D. Bertsekas, J. Tsitsiklis.
Neuro-Dynamic Programming, Athena Scientific, 1996.
T. Ferguson.
A Bayesian Analysis of Some Nonparametric Problems, in: The Annals of Statistics, 1973, vol. 1, no 2, pp. 209–230.
T. Hastie, R. Tibshirani, J. Friedman.
The elements of statistical learning — Data Mining, Inference, and Prediction, Springer, 2001.
W. Powell.
Approximate Dynamic Programming, Wiley, 2007.
M. Puterman.
Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley and Sons, 1994.
H. Robbins.
Some aspects of the sequential design of experiments, in: Bull. Amer. Math. Soc., 1952, vol. 55, pp. 527–535.
J. Rust.
How Social Security and Medicare Affect Retirement Behavior in a World of Incomplete Market, in: Econometrica, July 1997, vol. 65, no 4, pp. 781–831.
J. Rust.
On the Optimal Lifetime of Nuclear Power Plants, in: Journal of Business & Economic Statistics, 1997, vol. 15, no 2, pp. 195–208.
R. Sutton, A. Barto.
Reinforcement learning: an introduction, MIT Press, 1998.
G. Tesauro.
Temporal Difference Learning and TD-Gammon, in: Communications of the ACM, March 1995, vol. 38, no 3.
P. Werbos.
ADP: Goals, Opportunities and Principles, IEEE Press, 2004, pp. 3–44, Handbook of learning and approximate dynamic programming.