Section: New Results
Source separation via sparse and adaptive representations
Main collaboration: Andrew Nesbit (Queen Mary, University of London), Matthieu Puigt (Laboratoire d'Astrophysique de Toulouse-Tarbes), Matthieu Kowalski (Laboratoire des Signaux et Systèmes, Supelec)
Source separation is the task of retrieving the source signals underlying a multichannel mixture signal, where each channel is the sum of filtered versions of the sources. The state-of-the-art approach, which we presented in a survey chapter  , consists of representing the signals in a given time-frequency basis and estimating the source coefficients by sparse decomposition in that basis, based on narrowband approximation of the mixing process. This approach often provides limited performance due to poor approximation of the mixing process in reverberant environments and to the use of a time-frequency basis where the sources overlap. We proposed a family of wideband source separation methods that circumvent the narrowband assumption and result in large performance improvements in reverberant environments  . In parallel, we studied a range of adaptive lapped orthogonal time-frequency bases originally designed for audio coding and explained how to estimate the best basis in a source separation context  ,  ,  . Finally, we provided an experimental validation of the implicit source independence assumption underlying the above approaches  .
New paradigms and new evaluation metrics for source separation
Volker Hohmann (University of Oldenburg, DE), Nobutaka Ono (University of Tokyo, JP), Jonathan Le Roux (NTT Communication Science Laboratories, JP)
In parallel with our work on sparse representations, we proposed a new generic probabilistic framework for audio source separation where each source is modeled as a zero-mean random variable whose parameters vary over the time-frequency plane  ,  . This framework makes it possible to combine a range of existing spectral and spatial source models as well as to design novel advanced models such as models of reverberated or spatially diffuse sources. The benefits of this framework were demonstrated both for the separation of instantaneous  and reverberant mixtures  ,  ,  ,  .
In addition, ideas to model additional phase dependencies between neighboring time-frequency bins or to replace the usual ML learning framework by discriminative learning were investigated in  ,  and  respectively. Finally, the state-of-the-art audio source separation evaluation metrics previously developed by METISS were further improved using a computional auditory processing front-end and a neural network to fit subjective performance measurements  . Theoretical performance bounds were also proposed for source separation methods based on Gaussian Mixture Models (GMM) of the source spectra  .