Section: New Results

Keywords : speaker characterisation, speaker adaptation, speaker selection, Gaussian Mixture Models (GMM), affective computing..

Speaker characterisation

Rapid Speaker Adaptation by Reference Model Interpolation

Participants : Wen Xuan Teng, Guillaume Gravier, Frédéric Bimbot.

Acoustic model based adaptation techniques have become in recent years an important element in speech recognition systems to tune the system to the user's voice. Moreover, in somme applicative contexts, speaker's adaptation must take place on-line and rapidly.

We have designed a novel algorithm for fast speaker adaptation using small amounts of adaptation data. The approach is based on a set of representative speakers which can provide a priori knowledge to guide the estimation a new speaker's model in the speaker space.

The proposed scenario is based on an a posteriori selection of reference models as opposed to conventional techniques (such as eignevoices) which uses a fixed set of reference speakers. It calls for a user-dependent linear interpolation of the parameters of the reference speaker models

Comparisons of the proposed approach on the IDIOLOGOS and PAIDIALOGOS corpora have yields to slightly better performances tha eigenvoices on a phoneme recognition task, especially for atypical speakers such as children  [Oops!] .

Voice characteristics modelling for emotion and cognitive state classification

Keywords : voice interaction, emotion, psychoacoustic, cognitive state.

Participants : Klara Trakas, Frédéric Bimbot.

This work is taking place in the context of an industrial PhD just starting with Orange FTR&D Labs.

Increased interest is noticeable in the field of speaker characterisation for approaches able to describe and classify voice expressions such as emotion, cognitive state and, more generally, any type of information conveyed by the voice of a speaker voice and indicative of his/her state of mind.

Joint work between the Metiss Group is just starting to investigate descriptors and models for representing this type of speaker's characteristics at several linguistic and para-linguistic levels, together with training algorithms and decision strategies which enable the fusion multiple sources of information.


