Section: New Results
Speech recognition for multimedia structuring and indexing
Speech based structuring and indexing of audio-visual documents
Work done in close collaboration with the Texmex project-team of IRISA, in particular with Pascale Sébillot and Fabienne Moreau.
Speech can be used to structure and index large collections of spoken documents (videos, audio streams...) based on semantics. This is typically achieved by first transforming speech into text using automatic speech recognition (ASR), before applying natural language processing (NLP) techniques on the transcriptions. Our research focuses on the integration of ASR and NLP techniques in the framework of large scale analysis of multimedia document collections  .
Topic segmentation and adaptation
We improved our former extension of the text-based topic segmentation method of Utiyama and Isahara  to take into account additional knowledge such as semantic relations between words, discourse markers (like " and now, thank you "), and acoustic cues  . Results obtained on radio broadcast news make it possible to apply the method to large scale TV streams, eventually in conjunction with image-based features, as considered in C. Guinaudeau's Ph. D. thesis.
We also investigated efficient methods for the extraction of keywords to characterize thematic segments, in order to improve the language model of the ASR system using related texts retrieved on the Internet. Experiments reported in  have shown that topic adaptation is more effective when included in the early recognition stages. Thus, we focused on keyword extraction at the very beginning of the transcription using confusion networks rather than a single sentence.
Semantic verification of TV programmes
We investigated the use of automatic transcriptions of TV programs to validate labels automatically obtained from an electronic program guide (EPG). Given an online TV program guide, we can associate the phonetic or textual transcription of the soundtrack with descriptions extracted from the TV guide, using techniques inspired from the information retrieval field. Names obtained from the TV guide are then compared with the respective labels obtained from the EPG alignment. The phonetic and textual methods implemented allow to make a decision for 40 % of the segments and to decrease the labeling error rate by 3.5 %.
Audio information retrieval in multilingual audiovisual contents
Participant : Guillaume Gravier.
Work done in close collaboration with the Texmex project-team of IRISA.
In the framework of our participation to The Star Challenge, we developed a system for phonetic-based information retrieval in multilingual collections of videos. Phonetic recognition is performed with French phoneme models and a classical boolean information retrieval model is used to index sequences of respectively 2, 3 and 4 phonemes  . Finally, the resulting rankings are merged using rank aggregation methods. Experiments with a database containing 4 languages demonstrated the effectiveness of the method accross languages using relatively simple models (context-independent 3 state HMM with 32 Gaussians / state). Complex models were inefficient unless from the target language. Finally, promising results were obtained using query expansion based on phonetic confusions.