Section: New Results
Effective learning algorithms and architectures
Structured variable selection with sparsity-inducing norms (R. Jenatton, J-Y. Audibert and F. Bach)
We consider the empirical risk minimization problem for linear supervised learning, with regularization by structured sparsity-inducing norms.
These are defined as sums of Euclidean norms on certain subsets of variables, extending the usual 1 -norm and the group
1 -norm by allowing the subsets to overlap.
This leads to a specific set of allowed nonzero patterns for the solutions of such problems. We first explore the relationship between the groups defining the norm and the resulting nonzero patterns, providing both forward and backward algorithms to go back and forth from groups to patterns. This allows the design of norms adapted to specific prior knowledge expressed in terms of nonzero patterns. We also present an efficient active set algorithm, and analyze the consistency of variable selection for least-squares linear regression in low and high-dimensional settings.
Structured sparse principal component analysis (R. Jenatton, G. Obozinski and F. Bach)
We present an extension of sparse PCA, or sparse dictionary learning, where the sparsity patterns of all dictionary elements are structured and constrained to belong to a prespecified set of shapes. This structured sparse PCA is based on a structured regularization recently introduced by [50] . While classical sparse priors only deal with cardinality , the regularization we use encodes higher-order information about the data. We propose an efficient and simple optimization procedure to solve this problem. Experiments with two practical tasks, face recognition and the study of the dynamics of a protein complex, demonstrate the benefits of the proposed structured approach over unstructured approaches.
Group Lasso with Overlap and Graph Lasso (G. Obozinksi, joint work with L. Jacob and J.-P. Vert, Ecole des Mines de Paris
Modeling techniques that yield sparse models usually do not take into account available structural information on variables such as the fact that predictive features are often organised in groups or on graphs. For example, in computer vision, features associated to pixel lie naturally on a grid, discontinuities in the image often lie on curves, in computational biology predictive genes lie on regulation or interaction networks. In all these applications it is expected that relevant features are typically highly connected or concentrated in few groups. We developped in [31] regularization schemes that encode this prior information in the feature selection process to obtain sparse models which respect the structure and are therefore more accurate and more interpretable. We are considering the possibility of applying these models to contour extraction in images.
Global alignment of protein interaction networks by graph matching methods (F. Bach, in collaboration with M. Zaslavskiy and J.-P. Vert, Ecole des Mines de Paris)
Aligning protein-protein interaction (PPI) networks of different species has drawn a considerable interest recently. This problem is important to investigate evolutionary conserved pathways or protein complexes across species, and to help in the identification of functional orthologs through the detection of conserved interactions. It is however a difficult combinatorial problem, for which only heuristic methods have been proposed so far. We reformulate the PPI alignment as a graph matching problem, and investigate how state-of-the-art graph matching algorithms can be used for that purpose. We differentiate between two alignment problems, depending on whether strict constraints on protein matches are given, based on sequence similarity, or whether the goal is instead to find an optimal compromise between sequence similarity and interaction conservation in the alignment. We propose in [20] new methods for both cases, and assess their performance on the alignment of the yeast and fly PPI networks. The new methods consistently outperform state-of-the-art algorithms, retrieving in particular 78% more conserved interactions than IsoRank for a given level of sequence similarity. The source code for all conducted experiments is available at http://cbio.ensmp.fr/proj/graphm_ppi/ .