Section: New Results
Source separation using multichannel Matching Pursuit
Keywords : underdetermined blind source separation, multichannel, linear instantaneous, Matching Pursuit, sparse decomposition.
The source separation problem consists in retrieving unknown signals (the sources) form the only knowledge of mixtures of these signals (the channels coming from each sensor). In the case we study, each channel is a linear combination of the sources, and there are more sources than channels, and at least two channels. Due to the underdeterminacy of the problem, knowing all the parameters of the mixing process is not sufficient to retrieve the sources. Focussing on the estimation of the sources –assuming the mixing process is known– we have studied methods to perform the separation based on sparse decomposition of the mixture with Matching Pursuit. Methods for the estimation of the mixing parameters are developped apart, by Simon Arberet and Rémi Gribonval.
The principle of the methods for the sources estimation is to use the space localization of the sources to discriminate them. To do this, we assume that the signals can be sparsely decomposed on a dictionary, e.g. local cosines or wavelets. This is done using the Matching Pursuit algorithm. Two methods are proposed. The first one consists in decomposing the channels by Matching Pursuit, choosing at each step one atom, and substracting it on all the channels, with the corresponding value on each channel. A direction, corresponding to the vector of these values, is then associated to each atom. A clustering of all these directions is done in a second step to assign each atom to the nearest source. Each source is then reconstructed by adding the atoms assigned to it.
The second approach is to include the knowledge of the sources directions in the Matching Pursuit process. The channels are then decomposed on multichannel atoms, constituted of a monochannel waveform, present on all the channels with a different strength, the ratio of these strengths being the direction of one source. By this way, the atoms are directly assigned to the sources.
These methods perform similarly to DUET and Bofill-Zibulevski's algorithm, two reference methods, for source separation of audio data. The second method is directly transposable for convolutive mixtures if the filters are known, which is a work in progress. Moreover, using adapted dictionaries (learnt from training data) for these methods instead of analytically designed atoms (such as Gabor atoms) is current work.
This work has been presented in  .
DEMIX: a robust algorithm to estimate the number of sources in a spatial mixture
Keywords : underdetermined source separation, multichannel, linear instantaneous, clustering, source localisation.
One main problem in sound source separation is the estimation of the number of mixed sources. Another issue is the estimation of the source directions in a multisensor mixture.
In complement to the separation methods based on Matching Pursuit, which we developed and evaluated assuming the mixing matrix is known, we have proposed a new robust method to estimate both the number of audio sources and the mixing directions in a linear instantaneous mixture, even with more sources than sensors.
Our method is based on a multiscale Short Time Fourier Transform (STFT), and relies on the assumption that at some (unknown) scales and time-frequency points, only one source contributes to the mixture. Such points provide estimates of the corresponding directions. Our main contribution is a new method to detect points where this assumption is valid, along with a confidence measure. We also propose a new clustering algorithm called DEMIX to estimate the number of sources and their directions.
In contrast to DUET or other similar sparsity-based algorithms, which rely on a global scatter plot, our algorithm exploits the new confidence measure to weight the influence of each time-frequency point in the estimated directions. The proposed DEMIX algorithm is inspired from work by Deville, based on a confidence measure using time-frequency local persistence of the activity/inactivity of each source. The performance of DEMIX is assessed for counting the sources and estimating the mixing directions on stereophonic mixtures.
In our experiments, DEMIX yields better experimental results than those obtained by K-means and ELBG clustering algorithms to estimate source directions. Moreover DEMIX is, to our knowledge, the only algorithm, to count the number of sources. This work is currently submitted for publication.
Single channel source separation
Keywords : Single channel source separation, Gaussian mixture model, Wiener filter, model adaptation.
The problem of one microphone source separation applied to singing voice extraction is considered. An approach based on a priori Gaussian Mixture Models of two sources is used. Instead of using general source models (i.e., models learned on sources issued from recordings different from those to be separated) we propose to use adapted models (i.e., models with characteristics mapped to those of the mixed sources). Assuming that processed recording is segmented into vocal and non-vocal parts, music model is learned on the non-vocal parts and the general voice model is adapted on the vocal parts. For voice model adaptation we introduce two constrained adaptation techniques : filter adaptation and Power Spectral Density (PSD) gains adaptation. Joint filter and PSD gains adaptation are also possible and give the best performance. Finally, we show that our singing voice extraction system can be also used for singing voice pitch estimation in polyphonic music.
Evaluation of source separation algorithms
Keywords : blind source separation, evaluation, performance measure, benchmark.
Participant : Rémi Gribonval.
Source separation of under-determined and/or convolutive mixtures is a difficult problem that has been addressed by many algorithms which may include parametric source models, mixing models, linear or nonlinear separation systems, etc. Their separation performance is usually limited by several factors including badly designed source models or local maxima of the function to be optimized. But also, performance may be limited by constraints on the estimate, such as the length of the demixing filters or the number of frequency bins of the time-frequency masks. The best possible source that can be estimated under these constraints (in the ideal case where source models and optimization algorithms are perfect) is called an oracle estimator of the source. In order to study the performance of some families of source separation algorithms in an evaluation framework where the reference sources are available, we expressed and implemented oracle estimators for two classes (stationary filtering separation algorithms and time-frequency masking separation algorithms) and studied their performance on a few audio mixture examples.
This work has been published in  . It was done in collaboration with Emmanuel Vincent (Queen Mary University).