Section: New Results
Source separation via sparse and adaptive representations
Main collaboration: Andrew Nesbit (Queen Mary, University of London), Matthieu Puigt (Laboratoire d'Astrophysique de Toulouse-Tarbes)
Source separation is the task of retrieving the source signals underlying a multichannel mixture signal, where each channel is the sum of filtered versions of the sources. The state-of-the-art approach consists of representing the signals in a given time-frequency basis and estimating the source coefficients by sparse decomposition in that basis, under an exact mixture reconstruction constraint relying on a frequency-wise approximation of the mixing process. This approach often provides limited performance due to poor approximation of the mixing process in reverberant environments and to the use of a time-frequency basis where the sources overlap. Our previous work on adaptive stereo bases  showed promising results but suggested that the modeling of the mixing process and the choice of an adapted basis should be separately addressed so as to avoid over-fitting issues. We investigated the replacement of the mixture reconstruction constraint by a quadratic penalty term computed from the true mixing process, resulting in improved separation performance in reverberant environments with large microphone spacing  . We also studied a range of adaptive lapped orthogonal time-frequency bases originally designed for audio coding and explained how to estimate the best basis in a source separation context  ,  ,  . Finally, we provided an experimental validation of the implicit source independence assumption underlying the above approaches  .
A new probabilistic framework for source separation
Main collaboration: Alexey Ozerov (Telecom ParisTech)
In parallel with our work on sparse representations, we proposed a new framework for audio source separation where each source is modeled as a zero-mean Gaussian variable in the neighborhood of each time-frequency bin. This framework was first applied to the problem of source counting and localization and resulted in increased robustness by selection of the time-frequency bins with a single active source  . We subsequently investigated its use for the problem of source separation by defining two distinct models for the source variances: either a mild sparsity prior in each time-frequency bin  or a GMM prior introducing some dependencies between the variances in different frequency bins  . Both approaches were tested over instantaneous mixtures and provided respectively a significant improvement of the separation performance over all mixtures and an even larger improvement over music mixtures.