Project : metiss
Section: Application Domains
Advanced audio signal processing
Speech signals are commonly found surrounded or superimposed with other types of audio signals in many application areas. The former are often mixed with musical signals or background noise. Moreover, audio signals frequently exhibit a composite nature, in the sense that they were originally obtained by combining several audio tracks with an audio mixing device. Audio signals are also prone to suffer from all kinds of degradations –ranging from non-ideal recording conditions to transmission errors– after having travelled through a complete signal processing chain.
Recent breakthrough developments in the field of voice technology (speech and speaker recognition) are a strong motivation for studying how to adapt and apply this technology to a broader class of signals such as musical signals.
The main themes discussed here are therefore those of source separation and audio signal representation.
Audio source separation
The general problem of ``source separation'' consists in recovering a set of unknown sources from the observation of one or several of their mixtures, which may correspond to as many microphones. In the special case of speaker separation, the problem is to recover two speech signals contributed by two separate speakers that are recorded on the same media. The former issue can be extended to channel separation, which deals with the problem of isolating various simultaneous components in an audio recording (speech, music, singing voice, individual instruments, etc.). In the case of noise removal, one tries to isolate the ``meaningful'' signal, holding relevant information, from parasite noise. It can even be appropriate to view audio compression as a special case of source separation, one source being the compressed signal, the other being the residue of the compression process. The former examples illustrate how the general source separation problem spans many different problems and implies many foreseeable applications.
While in some cases –such as multichannel audio recording and processing– the source separation problem arises with a number of mixtures which is at least the number of unknown sources, the research on audio source separation within the METISS project-team rather focusses on the so-called under-determined case. More precisely, we consider the cases of one sensor (mono recording) for two or more sources, or two sensors (stereo recording) for sources.
Audio signal analysis and decomposition
The norms within the MPEG family, notably MPEG-4, introduce several sound description and transmission formats, with the notion of a ``score'', i.e. a high-level MIDI-like description, and an ``orchestra'', i.e. a set of ``instruments'' describing sonic textures. These formats promise to deliver very low bitrate coding, together with indexing and navigation facilities. However, it remains a challenge to design methods for transforming an arbitrary existing audio recording into a representation by such formats.
Atomic decomposition methods are yielding a rising interest in the field of sound synthesis and sound compression. They attempt to provide such decompositions by representing audio signals as linear sums of elementary signals (or ``atoms'') from a ``dictionary'', which can be seen as the instruments. In the classical model, ``sonic grains'' are deterministic functions (modulated sinusoïds, chirps, harmonic molecules, or even arbitrary waveforms stored in a wavetable, etc.). The reconstructed signal is then the M -term adaptive approximation of the original signal from the dictionary D . Non-linear approximation theory and decomposition methods such as Matching Pursuit and derivatives respectively provide a mathematical framework and powerful tools to tackle this kind of problem.
Granular techniques work by decomposing an audio signal into a great many ``elementary'' signals of short duration. Analysis methods drawing upon the concept of Gabor atoms rely on local-cosine type signals, with optional frequency-modulation. Granular synthesis techniques make it possible to compute highly complex sonic textures, with one notable drawback being the lack of user-control over the final result. We are working on an adaptive analysis method based on non-deterministic signals, called prototypes or models in this case, these signals being stochastic equivalents to the basis-vectors used in functional analysis. These prototypes are computed from the original signal, and they can subsequently be used to partially reconstruct (compress) the original signal, find the borders of the notes of a melody, re-synthesize the sound with control parameters easily tuned by the user.