Team METISS

Members
Overall Objectives
Scientific Foundations
Application Domains
Software
New Results
Contracts and Grants with Industry
Other Grants and Activities
Dissemination
Bibliography

Section: New Results

Audio motif and structure discovery

Audio motif and structure discovery

Keywords : data mining, pattern discovery, Bayesian networks.

Participants : Frédéric Bimbot, Guillaume Gravier, Armando Muscariello.

Audio motif discovery

Audio motif discovery aims at finding repeating patterns from large audio streams in an unsupervised manner. Extending the segmentation framework defined in  [72] , we proposed a motif discovery method tolerant to variations in both the spectral and temporal domains. Our method relies on a dynamic time warping algorithm with relaxed boundary constraints to search for repetitions of a seed block of signal in the near future. Repeating motifs are found by extending the seed when a match is found in the near future are iterativeloy stored in a library of motifs for long-term matching. The algorithm has been used in a word-discovery task which demonstrates the effectiveness of the approach to retrieve repeating motifs (fillers, words, locutions) in radio broadcast news data.

Discovering audiovisual structuring events in videos

Work carried out in collaboration with M. Ben and S. Campion from the Texmex project-team.

We have developed a cross-modal technique for the automatic discovery of audiovisual structuring events in TV programs, using only little prior knowledge for the definition of the targeted events. The algorithm is based on two separate hierarchical clustering processes, one for audio segments and one for video shots. The two resulting clustering trees are then correlated by measuring the mutual information between each pair of audio/video (A/V) clusters. The most correlated pair of cluster provides an initial segmentation into structuring events whose content is coherent both from the audio and visual viewpoint. Experiments on several kinds of TV programs have shown that the technique is able to extract the most relevant parts of the video, from a structuring point of view: anchorperson shots for TV news and report programs, audio/video jingles separating the reports for flash news programs.


previous
next

Logo Inria