Team, Visitors, External Collaborators
Overall Objectives
Research Program
Application Domains
Highlights of the Year
New Software and Platforms
New Results
Bilateral Contracts and Grants with Industry
Partnerships and Cooperations
Dissemination
Bibliography
XML PDF e-pub
PDF e-Pub


Bibliography

Major publications by the team in recent years
[1]
O. Cappé, A. Garivier, O.-A. Maillard, R. Munos, G. Stoltz.
Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation, in: Annals of Statistics, 2013, vol. 41, no 3, pp. 1516-1541.
https://hal.archives-ouvertes.fr/hal-00738209
[2]
A. Carpentier, M. Valko.
Revealing Graph Bandits for Maximizing Local Influence, in: Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, Cadiz, Spain, A. Gretton, C. C. Robert (editors), Proceedings of Machine Learning Research, PMLR, May 2016, vol. 51, pp. 10-18.
http://proceedings.mlr.press/v51/carpentier16a.html
[3]
H. De Vries, F. Strub, J. Mary, H. Larochelle, O. Pietquin, A. Courville.
Modulating early visual processing by language, in: Conference on Neural Information Processing Systems, Long Beach, United States, December 2017, pp. 6594-6604.
https://hal.inria.fr/hal-01648683
[4]
N. Gatti, A. Lazaric, M. Rocco, F. Trovò.
Truthful Learning Mechanisms for Multi–Slot Sponsored Search Auctions with Externalities, in: Artificial Intelligence, October 2015, vol. 227, pp. 93-139.
https://hal.inria.fr/hal-01237670
[5]
M. Ghavamzadeh, Y. Engel, M. Valko.
Bayesian Policy Gradient and Actor-Critic Algorithms, in: Journal of Machine Learning Research, January 2016, vol. 17, no 66, pp. 1-53.
https://hal.inria.fr/hal-00776608
[6]
H. Kadri, E. Duflos, P. Preux, S. Canu, A. Rakotomamonjy, J. Audiffren.
Operator-valued Kernels for Learning from Functional Response Data, in: Journal of Machine Learning Research (JMLR), April 2016, vol. 17, no 20, pp. 1-54.
https://hal.archives-ouvertes.fr/hal-01221329
[7]
E. Kaufmann, O. Cappé, A. Garivier.
On the Complexity of Best Arm Identification in Multi-Armed Bandit Models, in: Journal of Machine Learning Research, January 2016, vol. 17, pp. 1-42.
https://hal.archives-ouvertes.fr/hal-01024894
[8]
A. Lazaric, M. Ghavamzadeh, R. Munos.
Analysis of Classification-based Policy Iteration Algorithms, in: Journal of Machine Learning Research, 2016, vol. 17, pp. 1-30.
https://hal.inria.fr/hal-01401513
[9]
R. Munos.
From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning, in: Foundations and Trends in Machine Learning, 2014, vol. 7, no 1, pp. 1-129.
http://dx.doi.org/10.1561/2200000038
[10]
R. Ortner, D. Ryabko, P. Auer, R. Munos.
Regret bounds for restless Markov bandits, in: Journal of Theoretical Computer Science (TCS), 2014, vol. 558, pp. 62-76. [ DOI : 10.1016/j.tcs.2014.09.026 ]
https://hal.inria.fr/hal-01074077
Publications of the year

Doctoral Dissertations and Habilitation Theses

[11]
N. Carrara.
Reinforcement learning for Dialogue Systems optimization with user adaptation, Ecole Doctoral Science pour l'Ingénieur Université Lille Nord-de-France, December 2019.
https://tel.archives-ouvertes.fr/tel-02422691
[12]
R. Fruit.
Exploration-exploitation dilemma in Reinforcement Learning under various form of prior knowledge, Université de Lille 1, Sciences et Technologies; CRIStAL UMR 9189, November 2019.
https://tel.archives-ouvertes.fr/tel-02388395
[13]
O.-A. Maillard.
Mathematics of Statistiscal Sequential Decision Making, Université de Lille Nord de France, February 2019, Habilitation à diriger des recherches.
https://hal.archives-ouvertes.fr/tel-02077035

Articles in International Peer-Reviewed Journals

[14]
M.-A. Charpagne, F. Strub, T. M. Pollock.
Accurate reconstruction of EBSD datasets by a multimodal data approach using an evolutionary algorithm, in: Materials Characterization, April 2019, vol. 150, pp. 184-198, https://arxiv.org/abs/1903.02988 - A short version of this paper exists towards people working in Machine Learning, namely arxiv:1903.02982. [ DOI : 10.1016/j.matchar.2019.01.033 ]
https://hal.archives-ouvertes.fr/hal-02062098
[15]
A. R. Luedtke, E. Kaufmann, A. Chambaz.
Asymptotically Optimal Algorithms for Budgeted Multiple Play Bandits, in: Machine Learning Journal, September 2019, vol. 108, no 11, pp. 1919-1949, https://arxiv.org/abs/1606.09388.
https://hal.archives-ouvertes.fr/hal-01338733

International Conferences with Proceedings

[16]
Best Paper
M. Asadi, M. S. Talebi, H. Bourel, O.-A. Maillard.
Model-Based Reinforcement Learning Exploiting State-Action Equivalence, in: ACML 2019, Proceedings of Machine Learning Research, Nagoya, Japan, 2019, vol. 101, pp. 204 - 219.
https://hal.archives-ouvertes.fr/hal-02378887
[17]
P. Bartlett, V. Gabillon, J. Healey, M. Valko.
Scale-free adaptive planning for deterministic dynamics & discounted rewards, in: International Conference on Machine Learning, Long Beach, United States, 2019.
https://hal.inria.fr/hal-02387484
[18]
P. Bartlett, V. Gabillon, M. Valko.
A simple parameter-free and adaptive approach to optimization under a minimal local smoothness assumption, in: Algorithmic Learning Theory, Chicago, United States, 2019.
https://hal.inria.fr/hal-01885368
[19]
D. Calandriello, L. Carratino, A. Lazaric, M. Valko, L. Rosasco.
Gaussian process optimization with adaptive sketching: Scalable and no regret, in: Conference on Learning Theory, Phoenix, United States, 2019.
https://hal.inria.fr/hal-02144311
[20]
N. Carrara, E. Leurent, R. Laroche, T. Urvoy, O.-A. Maillard, O. Pietquin.
Budgeted Reinforcement Learning in Continuous State Space, in: Conference on Neural Information Processing Systems, Vancouver, Canada, Advances in Neural Information Processing Systems, December 2019, vol. 32, https://arxiv.org/abs/1903.01004.
https://hal.archives-ouvertes.fr/hal-02375727
[21]
M. Dereziński, D. Calandriello, M. Valko.
Exact sampling of determinantal point processes with sublinear time preprocessing, in: Neural Information Processing Systems, Vancouver, Canada, 2019.
https://hal.inria.fr/hal-02387524
[22]
C. Dimitrakakis, Y. Liu, D. Parkes, G. Radanovic.
Bayesian Fairness, in: AAAI 2019 - Thirty-Third AAAI Conference on Artificial Intelligence, Honolulu, United States, January 2019.
https://hal.inria.fr/hal-01953311
[23]
G. Gautier, R. Bardenet, M. Valko.
On two ways to use determinantal point processes for Monte Carlo integration – Long version, in: NeurIPS 2019 - Thirty-third Conference on Neural Information Processing Systems, Vancouver, Canada, Advances in Neural Information Processing Systems, 2019.
https://hal.archives-ouvertes.fr/hal-02277739
[24]
J.-B. Grill, O. D. Domingues, P. Ménard, R. Munos, M. Valko.
Planning in entropy-regularized Markov decision processes and games, in: Neural Information Processing Systems, Vancouver, Canada, 2019.
https://hal.inria.fr/hal-02387515
[25]
E. Leurent, O.-A. Maillard.
Practical Open-Loop Optimistic Planning, in: European Conference on Machine Learning, Würzburg, Germany, European Conference on Machine Learning, September 2019, https://arxiv.org/abs/1904.04700.
https://hal.archives-ouvertes.fr/hal-02375697
[26]
A. Locatelli, A. Carpentier, M. Valko.
Active multiple matrix completion with adaptive confidence sets, in: International Conference on Artificial Intelligence and Statistics, Okinawa, Japan, 2019.
https://hal.inria.fr/hal-02387468
[27]
O.-A. Maillard.
Sequential change-point detection: Laplace concentration of scan statistics and non-asymptotic delay bounds, in: Algorithmic Learning Theory, Chicago, United States, 2019, vol. 98, pp. 1 - 23.
https://hal.archives-ouvertes.fr/hal-02351665
[28]
C. Moy, L. Besson.
Decentralized Spectrum Learning for IoT Wireless Networks Collision Mitigation, in: ISIoT 2019 - 1st International Workshop on Intelligent Systems for the Internet of Things, Santorin, Greece, May 2019, https://arxiv.org/abs/1906.00614.
https://hal.inria.fr/hal-02144465
[29]
R. Ortner, M. Pirotta, R. Fruit, A. Lazaric, O.-A. Maillard.
Regret Bounds for Learning State Representations in Reinforcement Learning, in: Conference on Neural Information Processing Systems, Vancouver, Canada, Conference on Neural Information Processing Systems, December 2019.
https://hal.archives-ouvertes.fr/hal-02375715
[30]
P. Perrault, V. Perchet, M. Valko.
Exploiting structure of uncertainty for efficient matroid semi-bandits, in: International Conference on Machine Learning, Long Beach, United States, 2019.
https://hal.inria.fr/hal-02387478
[31]
P. Perrault, V. Perchet, M. Valko.
Finding the bandit in a graph: Sequential search-and-stop, in: International Conference on Artificial Intelligence and Statistics, Okinawa, Japan, 2019.
https://hal.inria.fr/hal-02387465
[32]
J. Seznec, A. Locatelli, A. Carpentier, A. Lazaric, M. Valko.
Rotting bandits are not harder than stochastic ones, in: International Conference on Artificial Intelligence and Statistics, Naha, Japan, 2019.
https://hal.inria.fr/hal-01936894
[33]
X. Shang, E. Kaufmann, M. Valko.
A simple dynamic bandit algorithm for hyper-parameter tuning, in: Workshop on Automated Machine Learning at International Conference on Machine Learning, Long Beach, United States, AutoML@ICML 2019 - 6th ICML Workshop on Automated Machine Learning, June 2019.
https://hal.inria.fr/hal-02145200
[34]
X. Shang, E. Kaufmann, M. Valko.
General parallel optimization without a metric, in: Algorithmic Learning Theory, Chicago, United States, 2019, vol. 98.
https://hal.inria.fr/hal-02047225
[35]
M. S. Talebi, O.-A. Maillard.
Learning Multiple Markov Chains via Adaptive Allocation, in: Advances in Neural Information Processing Systems 32 (NIPS 2019), Vancouver, Canada, December 2019.
https://hal.archives-ouvertes.fr/hal-02387345

National Conferences with Proceedings

[36]
L. Besson, E. Kaufmann.
Non-asymptotic analysis of a sequential rupture detection test and its application to non-stationary bandits, in: GRETSI 2019 - XXVIIème Colloque francophone de traitement du signal et des images, Lille, France, August 2019.
https://hal.inria.fr/hal-02152243

Conferences without Proceedings

[37]
L. Besson, R. Bonnefoi, C. Moy.
GNU Radio Implementation of MALIN: "Multi-Armed bandits Learning for Internet-of-things Networks", in: IEEE WCNC 2019 - IEEE Wireless Communications and Networking Conference, Marrakech, Morocco, April 2019, https://arxiv.org/abs/1902.01734.
https://hal.inria.fr/hal-02006825
[38]
R. Bonnefoi, L. Besson, J. Manco-Vasquez, C. Moy.
Upper-Confidence Bound for Channel Selection in LPWA Networks with Retransmissions, in: The 1st International Workshop on Mathematical Tools and technologies for IoT and mMTC Networks Modeling, Marrakech, Morocco, Philippe Mary, Samir Perlaza, Petar Popovski, April 2019, https://arxiv.org/abs/1902.10615 - The source code (MATLAB or Octave) used for the simula-tions and the figures is open-sourced under the MIT License, atBitbucket.org/scee_ietr/ucb_smart_retrans.
https://hal.inria.fr/hal-02049824
[39]
Y. Flet-Berliac, P. Preux.
MERL: Multi-Head Reinforcement Learning, in: NeurIPS 2019 Deep Reinforcement Learning Workshop, Vancouver, Canada, December 2019, https://arxiv.org/abs/1909.11939.
https://hal.inria.fr/hal-02305105
[40]
G. Gautier, R. Bardenet, M. Valko.
On two ways to use determinantal point processes for Monte Carlo integration, in: NEGDEPML 2019 - ICML Workshop on Negative Dependence in ML, Long Beach, CA, United States, June 2019.
https://hal.archives-ouvertes.fr/hal-02160382
[41]
T. Levent, P. Preux, E. Le Pennec, J. Badosa, G. Henri, Y. Bonnassieux.
Energy Management for Microgrids: a Reinforcement Learning Approach, in: ISGT-Europe 2019 - IEEE PES Innovative Smart Grid Technologies Europe, Bucharest, France, IEEE, September 2019, pp. 1-5. [ DOI : 10.1109/ISGTEurope.2019.8905538 ]
https://hal.archives-ouvertes.fr/hal-02382232
[42]
M. Seurin, P. Preux, O. Pietquin.
"I'm sorry Dave, I'm afraid I can't do that" Deep Q-Learning From Forbidden Actions, in: Workshop on Safety and Robustness in Decision Making (NeurIPS 2019), Vancouver, Canada, December 2019.
https://hal.inria.fr/hal-02387419

Other Publications

[43]
L. Besson, E. Kaufmann.
The Generalized Likelihood Ratio Test meets klUCB: an Improved Algorithm for Piece-Wise Non-Stationary Bandits, February 2019, https://arxiv.org/abs/1902.01575 - working paper or preprint.
https://hal.inria.fr/hal-02006471
[44]
E. Boursier, E. Kaufmann, A. Mehrabian, V. Perchet.
A Practical Algorithm for Multiplayer Bandits when Arm Means Vary Among Players, May 2019, https://arxiv.org/abs/1902.01239 - working paper or preprint.
https://hal.archives-ouvertes.fr/hal-02006069
[45]
G. Cideron, M. Seurin, F. Strub, O. Pietquin.
Self-Educated Language Agent With Hindsight Experience Replay For Instruction Following, November 2019, https://arxiv.org/abs/1910.09451 - working paper or preprint. [ DOI : 10.09451 ]
https://hal.archives-ouvertes.fr/hal-02386585
[46]
R. Degenne, W. M. Koolen, P. Ménard.
Non-Asymptotic Pure Exploration by Solving Games, December 2019, working paper or preprint.
https://hal.archives-ouvertes.fr/hal-02402665
[47]
Y. Flet-Berliac, P. Preux.
High-Dimensional Control Using Generalized Auxiliary Tasks, November 2019, working paper or preprint.
https://hal.inria.fr/hal-02295705
[48]
Y. Flet-Berliac, P. Preux.
Samples Are Useful? Not Always: denoising policy gradient updates using variance explained, September 2019, https://arxiv.org/abs/1904.04025 - working paper or preprint.
https://hal.inria.fr/hal-02091547
[49]
A. Garivier, H. Hadiji, P. Ménard, G. Stoltz.
KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints, November 2019, working paper or preprint.
https://hal.archives-ouvertes.fr/hal-01785705
[50]
A. Garivier, E. Kaufmann.
Non-Asymptotic Sequential Tests for Overlapping Hypotheses and application to near optimal arm identification in bandit models, May 2019, https://arxiv.org/abs/1905.03495 - working paper or preprint.
https://hal.archives-ouvertes.fr/hal-02123833
[51]
E. Leurent, Y. Blanco, D. Efimov, O.-A. Maillard.
Approximate Robust Control of Uncertain Dynamical Systems, February 2019, https://arxiv.org/abs/1903.00220 - working paper or preprint.
https://hal.archives-ouvertes.fr/hal-01931744
[52]
E. Leurent, J. Mercat.
Social Attention for Autonomous Decision-Making in Dense Traffic, November 2019, https://arxiv.org/abs/1911.12250 - working paper or preprint.
https://hal.archives-ouvertes.fr/hal-02383940
[53]
O.-A. Maillard, T. A. Mann, R. Ortner, S. Mannor.
Active Roll-outs in MDP with Irreversible Dynamics, July 2019, working paper or preprint.
https://hal.archives-ouvertes.fr/hal-02177808
[54]
X. Shang, R. De Heide, E. Kaufmann, P. Ménard, M. Valko.
Fixed-confidence guarantees for Bayesian best-arm identification, October 2019, https://arxiv.org/abs/1910.10945 - working paper or preprint.
https://hal.archives-ouvertes.fr/hal-02330187
[55]
F. Strub, M.-A. Charpagne, T. M. Pollock.
Accurate reconstruction of EBSD datasets by a multimodal data approach using an evolutionary algorithm, March 2019, https://arxiv.org/abs/1903.02988 - A short version of this paper exists towards people working in Machine Learning, namely arxiv:1903.02982. [ DOI : 10.1016/j.matchar.2019.01.033 ]
https://hal.archives-ouvertes.fr/hal-02062104
[56]
C. Trinh, E. Kaufmann, C. Vernade, R. Combes.
Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling, December 2019, https://arxiv.org/abs/1912.03074 - working paper or preprint.
https://hal.archives-ouvertes.fr/hal-02396943
References in notes
[57]
P. Auer, N. Cesa-Bianchi, P. Fischer.
Finite-time analysis of the multi-armed bandit problem, in: Machine Learning, 2002, vol. 47, no 2/3, pp. 235–256.
[58]
R. Bellman.
Dynamic Programming, Princeton University Press, 1957.
[59]
D. Bertsekas, S. Shreve.
Stochastic Optimal Control (The Discrete Time Case), Academic Press, New York, 1978.
[60]
D. Bertsekas, J. Tsitsiklis.
Neuro-Dynamic Programming, Athena Scientific, 1996.
[61]
M. Puterman.
Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley and Sons, 1994.
[62]
H. Robbins.
Some aspects of the sequential design of experiments, in: Bull. Amer. Math. Soc., 1952, vol. 55, pp. 527–535.
[63]
R. Sutton, A. Barto.
Reinforcement learning: an introduction, MIT Press, 1998.
[64]
P. Werbos.
ADP: Goals, Opportunities and Principles, IEEE Press, 2004, pp. 3–44, Handbook of learning and approximate dynamic programming.