Section: New Results
Largescale statistical learning
Dictionary Learning for Massive Matrix Factorization
Participants : Julien Mairal, Arthur Mensch [Parietal] , Gael Varoquaux [Parietal] , Bertrand Thirion [Parietal] .
Sparse matrix factorization is a popular tool to obtain interpretable data decompositions, which are also effective to perform data completion or denoising. Its applicability to large datasets has been addressed with online and randomized methods, that reduce the complexity in one of the matrix dimension, but not in both of them. In [25], [17], we tackle very large matrices in both dimensions. We propose a new factorization method that scales gracefully to terabytescale datasets. Those could not be processed by previous algorithms in a reasonable amount of time. We demonstrate the efficiency of our approach on massive functional Magnetic Resonance Imaging (fMRI) data, and on matrix completion problems for recommender systems, where we obtain significant speedups compared to stateofthe art coordinate descent methods. The main principle of the method is illustrated in Figure 13.

Stochastic Optimization with Variance Reduction for Infinite Datasets with FiniteSum Structure
Participants : Alberto Bietti, Julien Mairal.
Stochastic optimization algorithms with variance reduction have proven successful for minimizing large finite sums of functions. However, in the context of empirical risk minimization, it is often helpful to augment the training set by considering random perturbations of input examples. In this case, the objective is no longer a finite sum, and the main candidate for optimization is the stochastic gradient descent method (SGD). In this paper [26], we introduce a variance reduction approach for this setting when the objective is strongly convex. After an initial linearly convergent phase, the algorithm achieves a $O(1/t)$ convergence rate in expectation like SGD, but with a constant factor that is typically much smaller, depending on the variance of gradient estimates due to perturbations on a single example.
QuickeNing: A Generic QuasiNewton Algorithm for Faster GradientBased Optimization
Participants : Hongzhou Lin, Julien Mairal, Zaid Harchaoui [University of Washington] .
In this paper [28], we propose an approach to accelerate gradientbased optimization algorithms by giving them the ability to exploit curvature information using quasiNewton update rules. The proposed scheme, called QuickeNing, is generic and can be applied to a large class of firstorder methods such as incremental and blockcoordinate algorithms; it is also compatible with composite objectives, meaning that it has the ability to provide exactly sparse solutions when the objective involves a sparsityinducing regularization. QuickeNing relies on limitedmemory BFGS rules, making it appropriate for solving highdimensional optimization problems; with no linesearch, it is also simple to use and to implement. Besides, it enjoys a worstcase linear convergence rate for strongly convex problems. We present experimental results, see Figure 14, where QuickeNing gives significant improvements over competing methods for solving largescale highdimensional machine learning problems.
Dictionary Learning from Phaseless Measurements
Participants : Julien Mairal, Yonina Eldar [Technion] , Andreas Tillmann [TU Darmstadt] .
In [22], [12], we propose a new algorithm to learn a dictionary for reconstructing and sparsely encoding signals from measurements without phase. Specifically, we consider the task of estimating a twodimensional image from squaredmagnitude measurements of a complexvalued linear transformation of the original image. Several recent phase retrieval algorithms exploit underlying sparsity of the unknown signal in order to improve recovery performance. In this work, we consider such a sparse signal prior in the context of phase retrieval, when the sparsifying dictionary is not known in advance. Our algorithm jointly reconstructs the unknown signal—possibly corrupted by noise—and learns a dictionary such that each patch of the estimated image can be sparsely represented. Numerical experiments demonstrate that our approach can obtain significantly better reconstructions for phase retrieval problems with noise than methods that cannot exploit such “hidden” sparsity. Moreover, on the theoretical side, we provide a convergence result for our method.