Keywords
 A3.4. Machine learning and statistics
 A5.4. Computer vision
 A6.2. Scientific computing, Numerical Analysis & Optimization
 A7.1. Algorithms
 A8.2. Optimization
 A9.2. Machine learning
 B9.5.6. Data science
1 Team members, visitors, external collaborators
Research Scientists
 Francis Bach [Team leader, Inria, Senior Researcher, HDR]
 Pierre Gaillard [Inria, Researcher, until Aug 2020]
 Alessandro Rudi [Inria, Researcher]
 Umut Simsekli [Inria, Researcher, from Nov 2020]
 Adrien Taylor [Inria, Starting Research Position]
 Alexandre d'Aspremont [CNRS, Senior Researcher]
PostDoctoral Fellows
 Martin Arjovsky [Inria]
 Alberto Bietti [Inria, until Aug 2020]
 Seyed Daneshmand [Inria, from Aug 2020]
 Remy Degenne [Inria, until Sep 2020]
 Ziad Kobeissi [Institut Louis Bachelier, from Oct 2020]
 PierreYves Masse [Université technique de Prague  Tchéquie, until Mar 2020]
 Boris Muzellec [Inria, from Nov 2020]
 Yifan Sun [École Normale Supérieure de Paris, until Aug 2020]
PhD Students
 Mathieu Barre [École Normale Supérieure de Paris]
 Eloise Berthier [DGA]
 Raphael Berthier [Inria]
 Margaux Bregere [EDF, until Oct 2020]
 Vivien Cabannes [Inria]
 Alexandre Defossez [Facebook, until Jun 2020]
 Radu Alexandru Dragomir [École polytechnique, codirected with Jérôme bolte]
 Gautier Izacard [CNRS, from Feb 2020]
 Remi Jezequel [École Normale Supérieure de Paris]
 Thomas Kerdreux [École polytechnique, PhD completed in Sept. 2020]
 Hans Kersting [Inria, from Oct 2020]
 Marc Lambert [DGA, from Sep 2020]
 Ulysse MarteauFerey [Inria]
 Gregoire Mialon [Inria, Co directed with Julie nMairal]
 Alex Nowak Vila [École Normale Supérieure de Paris]
 Loucas Pillaud Vivien [Ministère de l'Ecologie, de l'Energie, du Développement durable et de la Mer, until Aug 2020]
 Manon Romain [CNRS, from Sep 2020]
Technical Staff
 Loïc Estève [Inria, Engineer, until Feb 2020]
 Gautier Izacard [CNRS, Engineer, until Jan 2020]
Interns and Apprentices
 Stanislas Bénéteau [Ecole normale supérieure ParisSaclay, from Apr 2020 until Aug 2020]
 Celine Moucer [École polytechnique, from Apr 2020 until Aug 2020]
 Quentin Rebjock [Inria, until Mar 2020]
Administrative Assistants
 Helene Bessin Rousseau [Inria, until Jun 2020]
 Helene Milome [Inria]
 Scheherazade Rouag [Inria, from Nov 2020]
Visiting Scientists
 Anant Raj [Institut MaxPlanck, until Mar 2020]
 Manon Romain [CNRS, from Jun 2020 until Aug 2020]
 Aadirupa Saha [Institut Indien des Sciences, until Jan 2020]
2 Overall objectives
2.1 Statement
Machine learning is a recent scientific domain, positioned between applied mathematics, statistics and computer science. Its goals are the optimization, control, and modelisation of complex systems from examples. It applies to data from numerous engineering and scientific fields (e.g., vision, bioinformatics, neuroscience, audio processing, text processing, economy, finance, etc.), the ultimate goal being to derive general theories and algorithms allowing advances in each of these domains. Machine learning is characterized by the high quality and quantity of the exchanges between theory, algorithms and applications: interesting theoretical problems almost always emerge from applications, while theoretical analysis allows the understanding of why and when popular or successful algorithms do or do not work, and leads to proposing significant improvements.
Our academic positioning is exactly at the intersection between these three aspects—algorithms, theory and applications—and our main research goal is to make the link between theory and algorithms, and between algorithms and highimpact applications in various engineering and scientific fields, in particular computer vision, bioinformatics, audio processing, text processing and neuroimaging.
Machine learning is now a vast field of research and the team focuses on the following aspects: supervised learning (kernel methods, calibration), unsupervised learning (matrix factorization, statistical tests), parsimony (structured sparsity, theory and algorithms), and optimization (convex optimization, bandit learning). These four research axes are strongly interdependent, and the interplay between them is key to successful practical applications.
3 Research program
3.1 Supervised Learning
This part of our research focuses on methods where, given a set of examples of input/output pairs, the goal is to predict the output for a new input, with research on kernel methods, calibration methods, and multitask learning.
3.2 Unsupervised Learning
We focus here on methods where no output is given and the goal is to find structure of certain known types (e.g., discrete or lowdimensional) in the data, with a focus on matrix factorization, statistical tests, dimension reduction, and semisupervised learning.
3.3 Parsimony
The concept of parsimony is central to many areas of science. In the context of statistical machine learning, this takes the form of variable or feature selection. The team focuses primarily on structured sparsity, with theoretical and algorithmic contributions.
3.4 Optimization
Optimization in all its forms is central to machine learning, as many of its theoretical frameworks are based at least in part on empirical risk minimization. The team focuses primarily on convex and bandit optimization, with a particular focus on largescale optimization.
4 Application domains
4.1 Applications for Machine Learning
Machine learning research can be conducted from two main perspectives: the first one, which has been dominant in the last 30 years, is to design learning algorithms and theories which are as generic as possible, the goal being to make as few assumptions as possible regarding the problems to be solved and to let data speak for themselves. This has led to many interesting methodological developments and successful applications. However, we believe that this strategy has reached its limit for many application domains, such as computer vision, bioinformatics, neuroimaging, text and audio processing, which leads to the second perspective our team is built on: Research in machine learning theory and algorithms should be driven by interdisciplinary collaborations, so that specific prior knowledge may be properly introduced into the learning process, in particular with the following fields:
 Computer vision: object recognition, object detection, image segmentation, image/video processing, computational photography. In collaboration with the Willow projectteam.
 Bioinformatics: cancer diagnosis, protein function prediction, virtual screening. In collaboration with Institut Curie.
 Text processing: document collection modeling, language models.
 Audio processing: source separation, speech/music processing.
 Neuroimaging: braincomputer interface (fMRI, EEG, MEG).
5 Highlights of the year
 A. Rudi: Recipient of an ERC starting grant
 F. Bach: Election at the French Academy of Sciences
 F.P. Paty, A. d'Aspremont, M. Cuturi: AISTATS 2020 notable paper award
6 New results
6.1 Implicit Bias of Gradient Descent for Wide Twolayer Neural Networks Trained with the Logistic Loss
Neural networks trained to minimize the logistic (a.k.a. crossentropy) loss with gradientbased methods are observed to perform well in many supervised classification tasks. Towards understanding this phenomenon, we analyze the training and generalization behavior of infinitely wide twolayer neural networks with homogeneous activations. We show that the limits of the gradient flow on exponentially tailed losses can be fully characterized as a maxmargin classifier in a certain nonHilbertian space of functions. In presence of hidden lowdimensional structures, the resulting margin is independent of the ambiant dimension, which leads to strong generalization bounds. In contrast, training only the output layer implicitly solves a kernel support vector machine, which a priori does not enjoy such an adaptivity. Our analysis of training is nonquantitative in terms of running time but we prove computational guarantees in simplified settings by showing equivalences with online mirror descent. Finally, numerical experiments suggest that our analysis describes well the practical behavior of twolayer neural networks with ReLU activations and confirm the statistical benefits of this implicit bias
6.2 Learning with Differentiable Perturbed Optimizers
Machine learning pipelines often rely on optimization procedures to make discrete decisions (e.g., sorting, picking closest neighbors, or shortest paths). Although these discrete decisions are easily computed, they break the backpropagation of computational graphs. In order to expand the scope of learning problems that can be solved in an endtoend fashion, we propose a systematic method to transform optimizers into operations that are differentiable and never locally constant. Our approach relies on stochastically perturbed optimizers, and can be used readily together with existing solvers. Their derivatives can be evaluated efficiently, and smoothness tuned via the chosen noise amplitude. We also show how this framework can be connected to a family of losses developed in structured prediction, and give theoretical guarantees for their use in learning tasks. We demonstrate experimentally the performance of our approach on various tasks.
6.3 Statistically Preconditioned Accelerated Gradient Method for Distributed Optimization
We consider the setting of distributed empirical risk minimization where multiple machines compute the gradients in parallel and a centralized server updates the model parameters. In order to reduce the number of communications required to reach a given accuracy, we propose a preconditioned accelerated gradient method where the preconditioning is done by solving a local optimization problem over a subsampled dataset at the server. The convergence rate of the method depends on the square root of the relative condition number between the global and local loss functions. We estimate the relative condition number for linear prediction models by studying uniform concentration of the Hessians over a bounded domain, which allows us to derive improved convergence rates for existing preconditioned gradient methods and our accelerated method. Experiments on realworld datasets illustrate the benefits of acceleration in the illconditioned regime.
6.4 Batch Normalization Provably Avoids Rank Collapse for Randomly Initialised Deep Networks
Randomly initialized neural networks are known to become harder to train with increasing depth, unless architectural enhancements like residual connections and batch normalization are used. We here investigate this phenomenon by revisiting the connection between random initialization in deep networks and spectral instabilities in products of random matrices. Given the rich literature on random matrices, it is not surprising to find that the rank of the intermediate representations in unnormalized networks collapses quickly with depth. In this work we highlight the fact that batch normalization is an effective strategy to avoid rank collapse for both linear and ReLU networks. Leveraging tools from Markov chain theory, we derive a meaningful lower rank bound in deep linear networks. Empirically, we also demonstrate that this rank robustness generalizes to ReLU nets.Finally, we conduct an extensive set of experiments on realworld data sets, which confirm that rank stability is indeed a crucial condition for training modernday deep neural architectures.
6.5 Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model
In the context of statistical supervised learning, the noiseless linear model assumes that there exists a deterministic linear relation $Y=\langle {\theta}_{*},\Phi \left(U\right)\rangle $ between the random output $Y$ and the random feature vector $\Phi \left(U\right)$, a potentially nonlinear transformation of the inputs $U$. We analyze the convergence of singlepass, fixed stepsize stochastic gradient descent on the leastsquare risk under this model. The convergence of the iterates to the optimum ${\theta}_{*}$ and the decay of the generalization error follow polynomial convergence rates with exponents that both depend on the regularities of the optimum ${\theta}_{*}$ and of the feature vectors $\Phi \left(U\right)$. We interpret our result in the reproducing kernel Hilbert space framework. As a special case, we analyze an online algorithm for estimating a real function on the unit hypercube from the noiseless observation of its value at randomly sampled points; the convergence depends on the Sobolev smoothness of the function and of a chosen kernel. Finally, we apply our analysis beyond the supervised learning setting to obtain convergence rates for the averaging process (a.k.a. gossip algorithm) on a graph depending on its spectral dimension.
6.6 Consistent Structured Prediction with MaxMin Margin Markov Networks
Maxmargin methods for binary classification such as the support vector machine (SVM) have been extended to the structured prediction setting under the name of maxmargin Markov networks (M3N), or more generally structural SVMs. Unfortunately, these methods are statistically inconsistent when the relationship between inputs and labels is far from deterministic. We overcome such limitations by defining the learning problem in terms of a “maxmin” margin formulation, naming the resulting method maxmin margin Markov networks (M4N). We prove consistency and finite sample generalization bounds for M4N and provide an explicit algorithm to compute the estimator. The algorithm achieves a generalization error of $O(1/\sqrt{n})$ for a total cost of $O\left(n\right)$ projectionoracle calls (which have at most the same cost as the maxoracle from M3N). Experiments on multiclass classification, ordinal regression, sequence prediction and ranking demonstrate the effectiveness of the proposed method.
6.7 Fast and Robust Stability Region Estimation for Nonlinear Dynamical Systems
A linear quadratic regulator can stabilize a nonlinear dynamical system with a local feedback controller around a linearization point, while minimizing a given performance criteria. An important practical problem is to estimate the region of attraction of such a controller, that is, the region around this point where the controller is certified to be valid. This is especially important in the context of highly nonlinear dynamical systems. In this paper, we propose two stability certificates that are fast to compute and robust when the first, or second derivatives of the system dynamics are bounded. Associated with an efficient oracle to compute these bounds, this provides a simple stability region estimation algorithm compared to classic approaches of the state of the art. We experimentally validate that it can be applied to both polynomial and nonpolynomial systems of various dimensions, including standard robotic systems, for estimating region of attractions around equilibrium points, as well as for trajectory tracking.
6.8 Breaking the curse of dimensionality of Global Optimization of Nonconvex functions
We consider the global minimization of smooth functions based solely on function evaluations. Algorithms that achieve the optimal number of function evaluations for a given precision level typically rely on explicitly constructing an approximation of the function which is then minimized with algorithms that have exponential runningtime complexity. In this project, we consider an approach that jointly models the function to approximate and finds a global minimum. This is done by using infinite sums of square smooth functions and has strong links with polynomial sumofsquares hierarchies. Leveraging recent representation properties of reproducing kernel Hilbert spaces, the infinitedimensional optimization problem can be solved by subsampling in time polynomial in the number of function evaluations, and with theoretical guarantees on the obtained minimum.
Given $n$ samples, the computational cost is $O\left({n}^{3.5}\right)$ in time, $O\left({n}^{2}\right)$ in space, and we achieve a convergence rate to the global optimum that is $O\left({n}^{m/d+1/2+3/d}\right)$ where m is the degree of differentiability of the function and d the number of dimensions. The rate is nearly optimal in the case of Sobolev functions and more generally makes the proposed method particularly suitable for functions that have a large number of derivatives. Indeed, when m is in the order of d, the convergence rate to the global optimum does not suffer from the curse of dimensionality, which affects only the worstcase constants (that we track explicitly through the paper).
6.9 Efficient improper learning for online logistic regression
We considered the setting of online logistic regression with the objective of minimizing the regret with respect to the ${\ell}_{2}$ball of radius $B$. It was known (see [Hazan et al., 2014]) that any proper algorithm which had logarithmic regret in the number of samples (denoted n) necessarily suffered an exponential multiplicative constant in $B$. In this work, we designed an efficient improper algorithm that avoids this exponential constant while preserving a logarithmic regret. Indeed, [Foster et al., 2018] showed that the lower bound does not apply to improper algorithms and proposed a strategy based on exponential weights with prohibitive computational complexity. Our new algorithm based on regularized empirical risk minimization with surrogate losses satisfies a regret scaling as $O(Blog(Bn\left)\right)$ with a perround timecomplexity of order $O\left({d}^{2}\right)$.
6.10 Improved Sleeping Bandits with Stochastic Actions Sets and Adversarial Rewards
We considered the problem of sleeping bandits with stochastic action sets and adversarial rewards. In this setting, in contrast to most work in bandits, the actions may not be available at all times. For instance, some products might be out of stock in item recommendation. The best existing efficient (i.e., polynomialtime) algorithms for this problem only guarantee an $O\left({T}^{2/3}\right)$ upperbound on the regret. Yet, inefficient algorithms based on EXP4 can achieve $O\left(\sqrt{T}\right)$. In this work, we provided a new computationally efficient algorithm inspired by EXP3 satisfying a regret of order $O\left(\sqrt{T}\right)$ when the availabilities of each action $i\in \mathcal{A}$ are independent. We then studied the most general version of the problem where at each round available sets are generated from some unknown arbitrary distribution (i.e., without the independence assumption) and proposed an efficient algorithm with $O\left({2}^{K}\sqrt{T}\right)$ regret guarantee. Our theoretical results were corroborated with experimental evaluations.
7 Bilateral contracts and grants with industry
7.1 Bilateral contracts with industry
Microsoft Research: “Structured LargeScale Machine Learning”. Machine learning is now ubiquitous in industry, science, engineering, and personal life. While early successes were obtained by applying offtheshelf techniques, there are two main challenges faced by machine learning in the “big data” era: structure and scale. The project proposes to explore three axes, from theoretical, algorithmic and practical perspectives: (1) largescale convex optimization, (2) largescale combinatorial optimization and (3) sequential decision making for structured data. The project involves two Inria sites (Paris and Grenoble) and four MSR sites (Cambridge, New England, Redmond, New York). Project website: http://
7.2 Bilateral grants with industry
 Alexandre d’Aspremont, Francis Bach, Martin Jaggi (EPFL): Google Focused award.
 Francis Bach: Gift from Facebook AI Research.
 Alexandre d’Aspremont: fondation AXA, "Mécénat scientifique", optimisation & machine learning.
8 Partnerships and cooperations
8.1 International initiatives
8.1.1 Inria International Labs
4TUNE
 Title: Adaptive, Efficient, Provable and Flexible Tuning for Machine Learning
 Duration: 2020  2022
 Coordinator: Francis Bach

Partners:
 Machine Learning group, CWI (Netherlands)
 Inria contact: Francis Bach

Website:
http://
pierre. gaillard. me/ 4tune/  Summary: The longterm goal of 4TUNE is to push adaptive machine learning to the next level. We aim to develop refined methods, going beyond traditional worstcase analysis, for exploiting structure in the learning problem at hand. We will develop new theory and design sophisticated algorithms for the core tasks of statistical learning and individual sequence prediction. We are especially interested in understanding the connections between these tasks and developing unified methods for both. We will also investigate adaptivity to nonstandard patterns encountered in embedded learning tasks, in particular in iterative equilibrium computations.
FOAM
 Title: FirstOrder Accelerated Methods for machine learning
 Duration: 2020  2022
 Coordinator: Alexandre d'Aspremont

Partners:
 Mathematical and Computational Engineering, Pontificia Universidad Católica de Chile (Chile)
 Inria contact: Alexandre d'Aspremont

Website:
https://
sites. google. com/ view/ cguzman/ talksandevents/ foamassociateteam  Summary: Our main interest is to investigate novel and improved convergence results for firstorder iterative methods for saddlepoints, variational inequalities and fixed points, under the lens of PEP. Our interest in improving firstorder methods is also deeply related with applications in machine learning. Particularly in sparsityoriented inverse problems, optimization methods are the workhorse for state of the art results. On some of these problems, a set of new hypothesis and theoretical results shows improved complexity bounds for problems with good recovery guarantees and we plan to extend these new performance bounds to the variational framework.
8.2 European initiatives
8.2.1 FP7 & H2020 Projects
 European Research Council: SEQUOIA project (grant number 724063), 20172022 (F. Bach), “Robust algorithms for learning from modern data”.
8.3 National initiatives
 Alexandre d'Aspremont: IRIS, PSL “Science des données, données de la science”.
9 Dissemination
9.1 Promoting scientific activities
9.1.1 Scientific events: selection
Member of the conference program committees
 Pierre Gaillard, member of the program committee for the Conference on Learning Theory (COLT), 2020
Reviewer
 Adrien Taylor, reviewer for International Conference on Machine Learning (ICML), 2020 (top reviewer award).
 Adrien Taylor, reviewer for International Conference on Neural Information Processing Systems (Neurips), 2020 (top reviewer award).
 Adrien Taylor, reviewer for Conference on Decision and Control (CDC), 2020.
 Pierre Gaillard, reviewer for the International Conference on Artificial Intelligence and Statistics (Aistats), 2020
9.1.2 Journal
Member of the editorial boards
 Francis Bach, coeditorinchief, Journal of Machine Learning Research
 Francis Bach, associate Editor, Mathematical Programming
 Francis Bach, associate editor, Foundations of Computational Mathematics (FoCM)
Reviewer  reviewing activities
 Adrien Taylor, reviewer for Automatica.
 Adrien Taylor, reviewer for Journal of Machine Learning Research (JMLR).
 Adrien Taylor, reviewer for Mathematical Programming (MAPR).
 Adrein Taylor, reviewer for SIAM Journal on Optimization (SIOPT).
 Adrien Taylor, reviewer for Computational Optimization and Applications (COAP).
 Adrien Taylor, reviewer for Journal of Optimization Theory and Applications (JOTA).
 Pierre Gaillard, reviewer for Mathematics of Operations Research (MOR).
9.1.3 Invited talks
 Adrien Taylor, invited talk University of Cambridge (CCIMI seminars), February 2020, United Kingdom.
 Adrien Taylor, invited talk at Université catholique de Louvain (Mathematical engineering seminars), February 2020, Belgium.
 Adrien Taylor, invited talk at Pontificia Universidad Católica de Chile, April 2020, Online.
 Adrien Taylor, invited talk at One World Optimization seminars, June 2020, Online.
 Adrien Taylor, invited talk at CWIINRIA workshop, September 2020, Online.
 Pierre Gaillard, invited talk at the Valpred workshop, March 2020
 Pierre Gaillard, invited talk at the Potsdamer research seminar, June 2020, online.
 Pierre Gaillard, invited talk at the seminar of the Statify research team, Inria Grenoble, September 2020
 Alessandro Rudi, invited talk at University College of London, Gatsby unit, London October 2020.
 Francis Bach, invited virtual talk at Optimization for machine leaerning, CIRM, Luminy, March 2020.
 Francis Bach, invited talk at MIT, September 2020
 Francis Bach, invited virtual talk at the University of Texas, Austin, October 2020
 Francis Bach, invited virtual talk at the Symposium on the Mathematical Foundations of Data Science, Johns Hopkins University, October 2020
 Francis Bach, invited virtual talk at Harvard University, November 2020
 Francis Bach, invited virtual talk at CIMAT, Centro de Investigación en Matemáticas, Mexico, November 2020
9.2 Teaching  Supervision  Juries
9.2.1 Teaching
 Master: Alexandre d'Aspremont, Optimisation Combinatoire et Convexe, avec Zhentao Li, (2015Present) cours magistraux 30h, Master M1, ENS Paris.
 Master: Alexandre d'Aspremont, Optimisation convexe: modélisation, algorithmes et applications cours magistraux 21h (2011Present), Master M2 MVA, ENS PS.
 Master : Francis Bach, Optimisation et apprentissage statistique, 20h, Master M2 (Mathématiques de l'aléatoire), Université ParisSud, France.
 Master : Francis Bach, Machine Learning, 20h, Master ICFP (Physique), Université PSL.
 Master: Pierre Gaillard, Alessandro Rudi, Introduction to Machine Learning, 52h, L3, ENS, Paris.
 Master: Pierre Gaillard, Sequential learning, 20h, Master M2 MVA, ENS PS.
 Hausdorff school on MCMC: Francis Bach, 6 hours.
9.2.2 Supervision
 PhD in progress : Raphaël Berthier, started September 2017, supervised by Francis Bach and Pierre Gaillard.
 PhD in progress : Radu  Dragomir Alexandru, Bregman Gradient Methods, 2018, Alexandre d'Aspremont (joint with Jérôme Bolte)
 PhD in progress : Mathieu Barré, Accelerated Polyak Methods, 2018, Alexandre d'Aspremont
 PhD in progress : Grégoire Mialon, Sample Selection Methods, 2018, Alexandre d'Aspremont (joint with Julien Mairal)
 PhD in progress : Manon Romain, Causal Inference Algorithms, 2020, Alexandre d'Aspremont
 PhD in progress: Alex NowakVila, supervised by Francis Bach and Alessandro Rudi.
 PhD in progress: Ulysse Marteau Ferey, supervised by Francis Bach and Alessandro Rudi.
 PhD in progress: Vivien Cabannes, supervised by Francis Bach and Alessandro Rudi.
 PhD in progress: Eloise Berthier, supervised by Francis Bach.
 PhD in progress: Theo Ryffel, supervised by Francis Bach and David Pointcheval.
 PhD in progress: Rémi Jezequel, supervised by Pierre Gaillard and Alessandro Rudi.
 PhD in progress: Antoine Bambade, supervised by JeanPonce (Willow), Justin Carpentier (Willow), and Adrien Taylor.
 PhD in progress: Marc Lambert, supervised by Francis Bach and Silvère Bonnabel.
 PhD in progress: Ivan Lerner, coadvised with Anita Burgun et Antoine Neuraz.
 PhD defended: Alexandre Défossez, supervised by Francis Bach and Léon Bottou (Facebook AI Research), defended in July 2020
 PhD defended: Loucas PillaudVivien, supervised by Francis Bach and Alessandro Rudi, defended October 30 2020
 PhD defended: Margaux Brégère, supervised by Pierre Gaillard and Gilles Stoltz (Université ParisSud), defended in December 2020
 PhD defended : Thomas Kerdreux, New Complexity Bounds for Frank Wolfe, 2017, Alexandre d'Aspremont
9.2.3 Juries
 HdR Pierre Weiss, IMT Toulouse, September 2019 (Alexandre d'Aspremont).
 HDR Rémi Flamary, Université de Nice, November 2019 (Francis Bach).
10 Scientific production
10.1 Major publications
 1 unpublished'Nonparametric Models for Nonnegative Functions'.July 2020, working paper or preprint
 2 article'Sharpness, Restart and Acceleration'.SIAM Journal on Optimization301October 2020, 262289
10.2 Publications of the year
International journals
 3 article'MaxPlus Linear Approximations for Deterministic ContinuousState Markov Decision Processes'.IEEE Control Systems Letters43July 2020, 767772
 4 article'Ranking and synchronization from pairwise measurements via SVD'.Journal of Machine Learning Research2219February 2021, 163
 5 article'WorstCase Convergence Analysis of Inexact Gradient and Newton Methods Through Semidefinite Programming Performance Estimation'.SIAM Journal on Optimization303January 2020, 20532082
 6 article'Efficient Firstorder Methods for Convex Minimization: a Constructive Approach'.Mathematical Programming, Series A1842020, 183220
 7 article'Sharpness, Restart and Acceleration'.SIAM Journal on Optimization301October 2020, 262289
 8 article'Operator Splitting Performance Estimation: Tight Contraction Factors and Optimal Parameter Selection'.SIAM Journal on Optimization303January 2020, 22512271
International peerreviewed conferences
 9 inproceedings 'Who started this rumor? Quantifying the natural differential privacy guarantees of gossip protocols'. DISC 2020  34th International Symposium on Distributed Computing Freiburg / Virtual, Germany Inria 2020
 10 inproceedings'Implicit Bias of Gradient Descent for Wide Twolayer Neural Networks Trained with the Logistic Loss'.COLT 2020  33rd Annual Conference on Learning TheoryPMLRProceedings of Thirty Third Conference on Learning Theory125Graz / Virtual, AustriaJuly 2020, 13051338
 11 inproceedings 'Gamification of pure exploration for linear bandits'. ICML 2020  37th International Conference on Machine Learning Vienna / Virtual, Austria July 2020
 12 inproceedings 'Experimental Comparison of Semiparametric, Parametric, and Machine Learning Models for TimetoEvent Analysis Through the Concordance Index'. JDS 2020  52nd Statistics Days of the French Statistical Society (SFdS) Nice, France May 2020
 13 inproceedings 'DualFree Stochastic Decentralized Optimization with Variance Reduction'. NeurIPS 2020  34th Conference on Neural Information Processing Systems Advances in Neural Information Processing Systems Proceedings Vancouver / Virtual, Canada 2020
 14 inproceedings 'Statistically Preconditioned Accelerated Gradient Method for Distributed Optimization'. ICML 2020  Thirtyseventh International Conference on Machine Learning Proceedings of Machine Learning Research Vienna / Virtual, Austria 2020
 15 inproceedings 'Convergence and Stability of Graph Convolutional Networks on Large Random Graphs'. NeurIPS 2020  34th Conference on Neural Information Processing Systems Vancouver (virtual), Canada https://nips.cc/ December 2020
 16 inproceedings 'A Trainable Optimal Transport Embedding for Feature Aggregation and its Relationship to Attention'. ICLR 2021  The Ninth International Conference on Learning Representations Virtual, France May 2021
 17 inproceedings 'Screening Data Points in Empirical Risk Minimization via Ellipsoidal Regions and Safe Loss Functions'. AISTATS 2020  23rd International Conference on Artificial Intelligence and Statistics Palermo / Virtual, Italy June 2020
 18 inproceedings 'Statistical Estimation of the Poincaré constant and Application to Sampling Multimodal Distributions'. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics AISTATS 2020 : 23rd International Conference on Artificial Intelligence and Statistics Palermo / Virtual, Italy August 2020
 19 inproceedings 'Improved sleeping bandits with stochastic action sets and adversarial rewards'. ICML 2020  37th International Conference on Machine Learning Vienna / Virtual, Austria July 2020
Conferences without proceedings
 20 inproceedings 'Naive Feature Selection: Sparsity in Naive Bayes'. AISTATS 2020  23rd International Conference on Artificial Intelligence and Statistics Palermo / Virtual, Italy June 2020
 21 inproceedings 'Complexity Guarantees for Polyak Steps with Momentum'. COLT 2020  33rd Annual Conference on Learning Theory Graz / Virtual, Austria July 2020
 22 inproceedings 'Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model'. NeurIPS '20  34th International Conference on Neural Information Processing Systems Vancouver, Canada December 2020
 23 inproceedings'Structured Prediction with Partial Labelling through the Infimum Loss'.Proceedings of the 37th International Conference on Machine Learning, Proceeding of Machine Learning ResearchICML 2020  37th International Conference on Machine Learning119Proceedings of the 37th International Conference on Machine LearningOnline, United StatesJuly 2020, 12301239
 24 inproceedings 'SelfSupervised VQVAE for OneShot Music Style Transfer'. ICASSP 2021  IEEE International Conference on Acoustics, Speech and Signal Processing Toronto / Virtual, Canada June 2021
 25 inproceedings 'Efficient improper learning for online logistic regression'. COLT 2020  33rd Annual Conference on Learning Theory Graz / Virtual, Austria July 2020
 26 inproceedings 'Regularity as Regularization: Smooth and Strongly Convex Brenier Potentials in Optimal Transport'. AISTATS 2020  23rd International Conference on Artificial Intelligence and Statistics Palermo / Virtual, Italy June 2020
Doctoral dissertations and habilitation theses
 27 thesis 'Stochastic bandit algorithms for demand side management'. Université ParisSaclay December 2020
 28 thesis 'Accelerating conditional gradient methods'. Université Paris sciences et lettres June 2020
Reports & preprints
 29 misc 'FANOK: Knockoffs in Linear Time'. October 2020
 30 misc 'On the Effectiveness of Richardson Extrapolation in Machine Learning'. July 2020
 31 misc 'Principled Analyses and Design of FirstOrder Methods with Inexact Proximal Operators'. September 2020
 32 misc 'Convergence of Constrained Anderson Acceleration'. December 2020
 33 misc 'A Continuized View on Nesterov Acceleration'. February 2021
 34 misc 'Fast and Robust Stability Region Estimation for Nonlinear Dynamical Systems'. October 2020
 35 misc 'Deep Equals Shallow for ReLU Networks in Kernel Regimes'. October 2020
 36 misc 'Global Convergence of Frank Wolfe on One Hidden Layer Networks'. October 2020
 37 misc 'Experimental Comparison of Semiparametric, Parametric, and Machine Learning Models for TimetoEvent Analysis Through the Concordance Index'. March 2020
 38 misc 'An Approximate ShapleyFolkman Theorem'. October 2020
 39 misc 'The recursive variational Gaussian approximation (RVGA)'. December 2020
 40 misc 'Nonparametric Models for Nonnegative Functions'. July 2020
 41 misc 'Finitesample analysis of Mestimators using selfconcordance'. November 2020
 42 misc 'Nonstationary Online Regression'. November 2020
 43 misc 'Finding Global Minima via Kernel Approximations'. December 2020
 44 misc 'ARIANN: LowInteraction PrivacyPreserving Deep Learning via Function Secret Sharing'. July 2020
 45 misc 'Counterfactual Learning of Continuous Stochastic Policies'. June 2020