## Section: New Results

### Sparse decompositions: theory and algorithms

#### Learning of deformation-invariant atoms

Keywords : Redundant dictionary learning, atom, sparsity, shift invariance, Principal Component Analysis.

Participants : Sylvain Lesage, Boris Mailhé, Rémi Gribonval, Frédéric Bimbot.

Sparse approximation using redundant dictionaries is an efficient tool for many applications in the field of signal processing. The performances largely depend on the adaptation of the dictionary to the signal to decompose. As the statistical dependencies are most of the time not obvious in natural high-dimensional data, learning fundamental patterns is an alternative to analytical design of bases and has become a field of acute research. Most of the time, several different observed patterns can be viewed as different deformations of one generating function. For example, the underlying patterns of a class of signals can be found at any time, and in the design of a dictionary, this shift invariance property should be present. We developed a new algorithm for learning short generating functions, each of them building a set of atoms corresponding to all its translations. The resulting dictionary is highly redundant and shift invariant.

This algorithm learns the set of generating functions iteratively, from a set of learning signals. Each iteration is an alternate routine : we begin with a sparse decomposition of the learning signals on the dictionary generated by the learnt generating functions. We used Matching Pursuit for this step, mostly because of the availability of a fast implementation 5.3 . Then, for each generating function, we get one signal patch for each occurrence of this function found by the decomposition and we update the function to obtain a least-square error approximation of the patches. Depending on whether you allow some decomposition coefficients to be updated or not during this step, the new function is given by the first principal component or the centroid of the corresponding patches. The first method gives a better approximation of the patches while the second one yields a lower algorithmic complexity. Then we iterate the same process.

On natural images, the learnt atoms are similar to what is generally found in the litterature. On other data, like ECG or EEG, typical waveforms are retrieved. We also show the results of a test on audio data, where the approximation using some learnt atoms is sparser than using local cosines.

This work, which extends our previous work with the MOTIF algorithm [67] , was presented at a workshop.It was done in collaboration with the group of Pierre Vandergheynst (EPFL, Lausanne). We are currently working on other deformation classes, such as phase shifts for audio signals, dilatation and rotation for images.

#### Learning multimodal dictionaries: applications to audiovisual data

Keywords : Redundant dictionary learning, atom, sparsity, shift invariance, Principal Component Analysis, multimodal data, audiovisual data, speaker localization, speaker tracking, early fusion.

Participants : Sylvain Lesage, Boris Mailhé, Rémi Gribonval.

Real-world phenomena involve complex interactions between multiple signal modalities. As a consequence, humans are used to integrate at each instant perceptions from all their senses in order to enrich their understanding of the surrounding world. This paradigm can be also extremely useful in many signal processing and computer vision problems involving mutually related signals. The simultaneous processing of multi-modal data can in fact reveal information that is otherwise hidden when considering the signals independently. However, in natural multimodal signals, the statistical dependencies between modalities are in general not obvious. Learning fundamental multi-modal patterns could offer a deep insight into the structure of such signals. Typically, such recurrent patterns are shift invariant, thus the learning should try to find the best matching filters. In this paper we present an algorithm for iteratively learning multimodal generating functions that can be shifted at all positions in the signal. The learning is defined in such a way that it can be accomplished by iteratively solving a generalized eigenvector problem, which makes the algorithm fast, flexible and free of user-defined parameters. The proposed algorithm is applied to audiovisual sequences and we show that it is able to discover underlying structures in the data. In particular, it is possible to locate the mouse of a speaker based on the learnt multimodal dictionaries, even in adverse conditions where the audio is corrupted by noise and other speakers are visible (but not audible) who utter the same words as the target speaker. This work, which was done in collaboration with G. Monaci, P. Jost and P. Vandergheynst from EPFL was published in [43] and is currently submitted for possible journal publication.

#### Average case analysis of multichannel thresholding

Keywords : sparse decomposition, multichannel signal analysis, sensor networks, matching pursuit, thresholding, recovery analysis, average case, worst case.

Participants : Rémi Gribonval, Boris Mailhé.

Recent developments in sparse signal models mainly focus on analyzing sufficient conditions which which guarantee that various algorithms (matching pursuits, basis pursuit, ...) can ``recover'' a sparse signal representation. Typical conditions involve both basic properties of the representation itself (which should be sufficiently sparse or compressible) and of the dictionary used to represent the signal, which should satisfy some uniform uncertainty principle. Even though random dictionary models can be used to prove that strong uniform uncertainty principles are met by ``most'' dictionaries, it seems to remain combinatorial to check it for a specific dictionary, for which estimates based on the coherence provide very pessimistic recovery conditions.

In parallel to developments in sparse signal models, various application scenarios motivated renewed interest in processing not just a single signal, but many signals or channels at the same time. A striking example is sensor networks, where signals are monitored by low complexity devices whose observations are transfered to a central collector [70] . This central node thus faces the task of analyzing many, possibly high-dimensional, signals. Moreover, signals measured in sensor networks are typically not uncorrelated: there are global trends or components that appear in all signals, possibly in slightly altered forms.

We developped an analysis of the theoretical performance of two families
of simultaneous sparse representation algorithms. First, we considered
p-thresholding, a simple algorithm for recovering simultaneous sparse
approximations of multichannel signals. Our analysis is based on studying
the average behaviour in addition to the worst case one, and the spirit of
our results is the following: given a not too coherent dictionary and
signals with coefficients sufficiently large and balanced over the number
of channels, p-thresholding can recover superpositions of up to
atoms *with overwhelming probability* in dimension
d. Our conditions on are thus much less restrictive than in the
worst case where only atoms can be recovered.
Numerical simulations confirm our theoretical findings and show that
p-thresholding is an interesting low complexity alternative to
simultaneous greedy or convex relaxation algorithms for processing sparse
multichannel signals with balanced coefficients.

This work was done in collaboration with Karin Schnass and Pierre Vandergheynst, EPFL, and Holger Rauhut, University of Vienna. A paper is in preparation and a conference paper was submitted for publication.