Members
Overall Objectives
Research Program
Application Domains
Software and Platforms
New Results
Bilateral Contracts and Grants with Industry
Partnerships and Cooperations
Dissemination
Bibliography
XML PDF e-pub
PDF e-Pub


Bibliography

Publications of the year

Doctoral Dissertations and Habilitation Theses

[1]
A. Khaleghi.
Sur quelques problèmes non-supervisés impliquant des séries temporelles hautement dèpendantes, Institut national de recherche en informatique et en automatique (Inria), November 2013.
http://hal.inria.fr/tel-00920184

Articles in International Peer-Reviewed Journals

[2]
M. G. Azar, R. Munos, H. Kappen.
Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model, in: Machine Learning, 2013, vol. 91, no 3, pp. 325-349.
http://hal.inria.fr/hal-00831875
[3]
O. Cappé, A. Garivier, O.-A. Maillard, R. Munos, G. Stoltz.
Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation, in: Annals of Statistics, 2013, vol. 41, no 3, pp. 1516-1541, Accepted.
http://hal.inria.fr/hal-00738209
[4]
J. Fruitet, A. Carpentier, R. Munos, M. Clerc.
Automatic motor task selection via a bandit algorithm for a brain-controlled button, in: Journal of Neural Engineering, January 2013, vol. 10, no 1. [ DOI : 10.1088/1741-2560/10/1/016012 ]
http://hal.inria.fr/hal-00798561
[5]
M. Hauskrecht, I. Batal, M. Valko, S. Visweswaran, G. F. Cooper, G. Clermont.
Outlier detection for patient monitoring and alerting, in: Journal of Biomedical Informatics, February 2013, vol. 46, pp. 47-55. [ DOI : 10.1016/j.jbi.2012.08.004 ]
http://hal.inria.fr/hal-00742097
[6]
D. Ryabko, J. Mary.
A Binary-Classification-Based Metric between Time-Series Distributions and Its Use in Statistical and Learning Problems, in: Journal of Machine Learning Research, 2013, vol. 14, pp. 2837-2856.
http://hal.inria.fr/hal-00913240
[7]
B. Ryabko, D. Ryabko.
A confidence-set approach to signal denoising, in: Statistical Methodology, 2013, vol. 15, pp. 115–120.
http://hal.inria.fr/hal-00913253

International Conferences with Proceedings

[8]
B. Avila Pires, M. Ghavamzadeh, C. Szepesvari.
Cost-sensitive Multiclass Classification Risk Bounds, in: International Conference on Machine Learning, Atlanta, United States, 2013.
http://hal.inria.fr/hal-00840485
[9]
A. Carpentier, R. Munos.
Toward optimal stratification for stratified monte-carlo integration, in: International Conference on Machine Learning, United States, 2013.
http://hal.inria.fr/hal-00923685
[10]
P. Chainais, C. Richard.
Learning a common dictionary over a sensor network, in: CAMSAP 2013, Saint-Martin, France, December 2013, pp. 1-4.
http://hal.inria.fr/hal-00923742
[11]
R. Fonteneau, L. Busoniu, R. Munos.
Optimistic planning for belief-augmented Markov decision processes, in: IEEE International Symposium on Adaptive Dynamic Programming and reinforcement Learning, ADPRL 2013, Singapore, April 2013.
http://hal.inria.fr/hal-00840202
[12]
V. Gabillon, M. Ghavamzadeh, B. Scherrer.
Approximate Dynamic Programming Finally Performs Well in the Game of Tetris, in: Neural Information Processing Systems (NIPS) 2013, South Lake Tahoe, United States, 2013.
http://hal.inria.fr/hal-00921250
[13]
M. Gheshlaghi Azar, A. Lazaric, B. Emma.
Regret Bounds for Reinforcement Learning with Policy Advice, in: ECML/PKDD - European conference on machine learning and principles and practice of knowledge discovery in databases - 2013, Prague, Czech Republic, September 2013.
http://hal.inria.fr/hal-00924021
[14]
M. Gheshlaghi Azar, A. Lazaric, B. Emma.
Sequential Transfer in Multi-armed Bandit with Finite Set of Models, in: NIPS - Advances in Neural Information Processing Systems 25 - 2013, Lake Tahoe, United States, December 2013.
http://hal.inria.fr/hal-00924025
[15]
H. Kadri, M. Ghavamzadeh, P. Preux.
A Generalized Kernel Approach to Structured Output Learning, in: International Conference on Machine Learning (ICML), Atlanta, United States, 2013.
http://hal.inria.fr/hal-00695631
[16]
G. Kedenburg, R. Fonteneau, R. Munos.
Aggregating optimistic planning trees for solving markov decision processes, in: Advances in Neural Information Processing Systems, United States, 2013, pp. 2382-2390.
http://hal.inria.fr/hal-00923681
[17]
A. Khaleghi, D. Ryabko.
Nonparametric multiple change point estimation in highly dependent time series, in: Proc. 24th International Conf. on Algorithmic Learning Theory (ALT'13), Singapore, Springer, 2013, pp. 382-396.
http://hal.inria.fr/hal-00913250
[18]
N. Korda, E. Kaufmann, R. Munos.
Thompson sampling for one-dimensional exponential family bandits, in: Advances in Neural Information Processing Systems, United States, 2013.
http://hal.inria.fr/hal-00923683
[19]
B. Kveton, M. Valko.
Learning from a Single Labeled Face and a Stream of Unlabeled Data, in: 10th IEEE International Conference on Automatic Face and Gesture Recognition, Shanghai, China, January 2013.
http://hal.inria.fr/hal-00749197
[20]
O.-A. Maillard, P. Nguyen, R. Ortner, D. Ryabko.
Optimal Regret Bounds for Selecting the State Representation in Reinforcement Learning, in: ICML - 30th International Conference on Machine Learning, Atlanta, USA, United States, 2013, vol. 28(1), pp. 543-551.
http://hal.inria.fr/hal-00778586
[21]
P. Nguyen, O.-A. Maillard, D. Ryabko, R. Ortner.
Competing with an Infinite Set of Models in Reinforcement Learning, in: AISTATS, Arizona, United States, JMLR W&CP, 2013, vol. 31, pp. 463-471.
http://hal.inria.fr/hal-00823230
[22]
D. Ryabko.
Time-series information and learning, in: ISIT - International Symposium on Information Theory, Istanbul, Turkey, 2013, pp. 1392-1395.
http://hal.inria.fr/hal-00823233
[23]
D. Ryabko.
Unsupervised model-free representation learning, in: Proc. 24th International Conf. on Algorithmic Learning Theory (ALT'13), Singapore, Springer, 2013, pp. 354-366.
http://hal.inria.fr/hal-00913244
[24]
B. Szorenyi, R. Busa-Fekete, I. Hegedüs, R. Ormandi, M. Jelasity, B. Kégl.
Gossip-based distributed stochastic bandit algorithms, in: 30th International Conference on Machine Learning (ICML 2013), Atlanta, United States, S. Dasgupta, D. McAllester (editors), 2013, vol. 28, pp. 19-27.
http://hal.inria.fr/in2p3-00907406
[25]
E. M. Thomas, M. Clerc, A. Carpentier, E. Daucé, D. Devlaminck, R. Munos.
Optimizing P300-speller sequences by RIP-ping groups apart, in: IEEE/EMBS 6th international conference on neural engineering (2013), San Diego, United States, IEEE/EMBS, November 2013.
http://hal.inria.fr/hal-00907781
[26]
M. Valko, A. Carpentier, R. Munos.
Stochastic Simultaneous Optimistic Optimization, in: 30th International Conference on Machine Learning, Atlanta, United States, February 2013.
http://hal.inria.fr/hal-00789606
[27]
M. Valko, N. Korda, R. Munos, I. Flaounas, N. Cristianini.
Finite-Time Analysis of Kernelised Contextual Bandits, in: The 29th Conference on Uncertainty in Artificial Intelligence, Bellevue, United States, 2013.
http://hal.inria.fr/hal-00826946

National Conferences with Proceedings

[28]
P. Bas, P. Chainais, E. Zidel - Cauffet.
Quantification adaptative pour la stéganalyse d'images texturées, in: GRETSI 2013, Brest, France, September 2013.
http://hal.inria.fr/hal-00868550
[29]
P. Chainais, C. Richard.
Distributed dictionary learning over a sensor network, in: CaP 2013, Villeneuve d'Ascq, France, July 2013, pp. 1-4.
http://hal.inria.fr/hal-00923741

Scientific Books (or Scientific Book chapters)

[30]
L. Busoniu, R. Munos, R. Babuska.
A review of optimistic planning in Markov decision processes, in: Reinforcement Learning and Adaptive Dynamic Programming for Feedback Control, F. Lewis, D. Liu (editors), IEEE Press Series on Computational Intelligence, Wiley-IEEE Press, January 2013, chap. 22, pp. 494-516.
http://hal.inria.fr/hal-00756742

Internal Reports

[31]
M. Ghavamzadeh, Y. Engel.
Bayesian Policy Gradient and Actor-Critic Algorithms, January 2013.
http://hal.inria.fr/hal-00776608
[32]
P. L.A., M. Ghavamzadeh.
Actor-Critic Algorithms for Risk-Sensitive MDPs, February 2013.
http://hal.inria.fr/hal-00794721
[33]
R. Munos.
From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning, 2013.
http://hal.inria.fr/hal-00747575
References in notes
[34]
P. Auer, N. Cesa-Bianchi, P. Fischer.
Finite-time analysis of the multi-armed bandit problem, in: Machine Learning, 2002, vol. 47, no 2/3, pp. 235–256.
[35]
R. Bellman.
Dynamic Programming, Princeton University Press, 1957.
[36]
D. Bertsekas, S. Shreve.
Stochastic Optimal Control (The Discrete Time Case), Academic Press, New York, 1978.
[37]
D. Bertsekas, J. Tsitsiklis.
Neuro-Dynamic Programming, Athena Scientific, 1996.
[38]
T. Ferguson.
A Bayesian Analysis of Some Nonparametric Problems, in: The Annals of Statistics, 1973, vol. 1, no 2, pp. 209–230.
[39]
T. Hastie, R. Tibshirani, J. Friedman.
The elements of statistical learning — Data Mining, Inference, and Prediction, Springer, 2001.
[40]
W. Powell.
Approximate Dynamic Programming, Wiley, 2007.
[41]
M. Puterman.
Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley and Sons, 1994.
[42]
H. Robbins.
Some aspects of the sequential design of experiments, in: Bull. Amer. Math. Soc., 1952, vol. 55, pp. 527–535.
[43]
J. Rust.
How Social Security and Medicare Affect Retirement Behavior in a World of Incomplete Market, in: Econometrica, July 1997, vol. 65, no 4, pp. 781–831.
http://gemini.econ.umd.edu/jrust/research/rustphelan.pdf
[44]
J. Rust.
On the Optimal Lifetime of Nuclear Power Plants, in: Journal of Business & Economic Statistics, 1997, vol. 15, no 2, pp. 195–208.
[45]
R. Sutton, A. Barto.
Reinforcement learning: an introduction, MIT Press, 1998.
[46]
G. Tesauro.
Temporal Difference Learning and TD-Gammon, in: Communications of the ACM, March 1995, vol. 38, no 3.
[47]
P. Werbos.
ADP: Goals, Opportunities and Principles, IEEE Press, 2004, pp. 3–44, Handbook of learning and approximate dynamic programming.