Section: New Results
Keywords : speaker characterisation, speaker adaptation, speaker selection, Gaussian Mixture Models (GMM) (gaussian mixture model, gaussian mixture models), model interpolation, affective computing, voice interaction, emotion, psychoacoustic, cognitive state.
Speaker modeling and characterisation
Rapid Speaker Adaptation by variable Reference Model Interpolation
This work has taken place in the context of an industrial PhD with TELISMA.
Rapid adaptation of acoustic models for automatic speech recognition requires some form of a priori knowledge to guide the estimation of a new speaker model. Most techniques are based on linear combinations in a speaker subspace derived from fixed, a priori, speaker models (cf. the eigenvoice approach). In this context, the adaptation process may not provide robust solutions for a particular adaptation target, expecially when the number of reference models is small.
The approach investigated in  involves using variable subspaces at runtime for different adaptation targets. This yields a novel approach called variable RMI (Reference Model Interpolation) based on an a posteriori selection of reference models, with various possible selection criteria.
The proposed tehcnique has been applied and tested on phoneme decoding and LVCSR (Large Vocabulary Continuous Speech Recognition) tasks, and evaluated both in supervised and unsupervised adaptation modes. Experiments on three distinc databases (IDIOLOGOS, PAIDIALOGOS and ESTER) have shown the effectiveness of the variable RMI approach with utterance bu utterance on-line adaptation.
Voice modelling for emotion and cognitive state classification
This work has taken place in the context of an industrial PhD with Orange FTR&D Labs.
A growing interest has emerged in the field of speaker characterisation for approaches able to describe and classify voice expressions such as emotion, cognitive state and, more generally, any type of information conveyed by the voice of a speaker voice and indicative of his/her state of mind.
The first year of PhD of Klara Trakas has been focused on the analysis of a speech corpus composed of client's calls expressing their opinion on a hotline service. Human auditors were asked to give their perception of the emotional state of the speaker together with other impressions not related to emotions, so as to examine correlations between different classes of voice and speaker features.
This preliminary work was intended to lead to a robust system for emotion detection, investigating descriptors and models for representing speaker's characteristics at several linguistic and para-linguistic levels, together with training algorithms and decision strategies which enable the fusion multiple sources of information. However, the PhD was interrupted after one year (September 2008).