Section: New Results
Discovering acoustic words in unsegmented speech with no initial phonetic knowledge
In developmental robotics, one aims at building robots capables of learning progressively and continuously new skills and new knowledge. One important challenge relates to language acquisition: How can a robot learn its first words and their associated meanings? This entails many interrelated problems. One of them is: How can a robot learn adequate acoustic/auditory representations of words? The technical challenges amounts to finding invariant features in sentences that contain words associated to concrete predefined meanings, but in which words are not initially segmented, and for which one does not possess detectors of high-level phonological representations such as phonemes (consonants and vowels). We have developped an approach to this problem which is based on a tranposition of the notion bags of features recently developped in computer vision. Bags of acoustic features are unstructured collections of features characterizing local properties of the signal, removing the relative timing information, and on which one can do massive but fast statistical computations. The transposition involved in particular to elaborate methods for building and searching fastly in dictionaries of short sound sequences using a dynamic time warping similarity measure. We have shown, using a large database provided by the ACORNS European consortium which focus on the very problem of word discovery in unsegmented speech with no initial phonetic knowledge, that the bag-of-word approach allowed very performance in this task that are comparable to the best methods that were identified by the ACORNS consortium. An article presenting these results is in preparation.