Team METISS

Members
Overall Objectives
Scientific Foundations
Application Domains
Software
New Results
Contracts and Grants with Industry
Other Grants and Activities
Dissemination
Bibliography

Bibliography

Major publications by the team in recent years

[1]
M. Ben.
Approches robustes pour la vérification automatique du locuteur par normalisation et adaptation hiérarchique, Thèse de doctorat, Université de Rennes 1, IRISA, Rennes (France), November 2004.
[2]
L. Benaroya, F. Bimbot, R. Gribonval.
Audio Source Separation With a Single Sensor, in: IEEE Trans. Audio, Speech and Language Processing, January 2006, vol. 14, no 1, p. 191–199.
[3]
F. Bimbot, J.-F. Bonastre, C. Fredouille, G. Gravier, I. Magrin-Chagnolleau, S. Meignier, T. Merlin, J. Ortega-Garcia, D. A. Reynolds.
A tutorial on text-independent speaker verification, in: EURASIP Journal on Applied Signal Processing, April 2004, vol. 2004, no 4, p. 430–451.
[4]
F. Bimbot, G. Gravier.
Evaluation des systèmes de reconnaissance de la parole, in: Evaluation des systèmes de traitement de l'information, Traité des Sciences et Techniques de l'Information, Hermes Science Publications, 2004, chap. 8, p. 189–213.
[5]
L. Borup, R. Gribonval, M. Nielsen.
Bi-framelet systems with few vanishing moments characterize Besov spaces, in: Appl. Comp. Harmonic Anal. (special issue on frames in harmonic analysis), 2004, vol. 17, no 1–2.
[6]
R. Gribonval, R. M. Figueras i Ventura, P. Vandergheynst.
A simple test to check the optimality of sparse signal approximations, in: EURASIP Signal Processing, special issue on Sparse Approximations in Signal and Image Processing, March 2006, vol. 86, no 3, p. 496–510.
[7]
R. Gribonval, M. Nielsen.
Nonlinear approximation with dictionaries. I. Direct estimates, in: J. of Fourier Anal. and Appl., 2004, vol. 10, no 1.
[8]
R. Gribonval, M. Nielsen.
On approximation with spline generated framelets, in: Constructive Approx., January 2004, vol. 20, no 2, p. 207–232.
[9]
R. Gribonval, P. Vandergheynst.
On the exponential convergence of Matching Pursuits in quasi-incoherent dictionaries, in: IEEE Trans. Information Theory, January 2006, vol. 52, no 1, p. 255–261.
[10]
S. Huet, G. Gravier, P. Sébillot.
Are morpho-syntaxic taggers suitable to improve automatic transcription, in: Intl. Workshop on Text, Speech and Dialogue, 2006.
[11]
E. Kijak, G. Gravier, L. Oisel, P. Gros.
Audiovisual integration for tennis broadcast structuring, in: Multimedia Tools and Application, 2006, vol. 30, no 3, p. 289–312.
[12]
E. Vincent, R. Gribonval, C. Févotte.
Performance measurement in Blind Audio Source Separation, in: IEEE Trans. Speech, Audio and Language Processing, 2006, vol. 14, no 4, p. 1462–1469.

Publications of the year

Doctoral dissertations and Habilitation theses

[13]
M. Collet.
Mesures de similarité robustes dans un espace de locuteurs de référence. Application pour l'indexation de documents audio, Thèse de doctorat, Université de Rennes 1, IRISA, Rennes (France), September 2006.
[14]
M. Delakis.
Multimodal Tennis Video Structure Analysis with Segment Models, Ph. D. Thesis, University of Rennes 1, France, 2006.
[15]
A. Ozerov.
Adaptation de modèles statistiques pour la séparation de sources mono-capteur. Application à la séparation voix/musique dans les chansons, Thèse de doctorat, Université de Rennes 1, IRISA, Rennes (France), December 2006.

Articles in refereed journals and book chapters

[16]
L. Benaroya, F. Bimbot, G. Gravier, R. Gribonval.
Experiments in audio source separation with one sensor for robust speech recognition, in: Speech Communication, 2006, vol. 48, no 7, p. 848–854.
[17]
L. Benaroya, F. Bimbot, R. Gribonval.
Audio Source Separation With a Single Sensor, in: IEEE Trans. Audio, Speech and Language Processing, January 2006, vol. 14, no 1, p. 191–199.
[18]
F. Bimbot, M. Faundez-Zanuy, R. D. Mori.
Editorial of the Special Issue on Non-Linear and Non-Conventional Speech Processing (NOLISP'03), in: Speech Communication, July 2006, vol. 48, no 7, 759 p.
[19]
R. Gribonval, R. M. Figueras i Ventura, P. Vandergheynst.
A simple test to check the optimality of sparse signal approximations, in: EURASIP Signal Processing, special issue on Sparse Approximations in Signal and Image Processing, March 2006, vol. 86, no 3, p. 496–510.
[20]
R. Gribonval, M. Nielsen.
Beyond sparsity : recovering structured representations by L1-minimization and greedy algorithms, in: Advances in Computational Mathematics, to appear, 2006.
[21]
R. Gribonval, M. Nielsen.
Nonlinear approximation with dictionaries. II. Inverse estimates, in: Constructive Approximation, September 2006, vol. 24, no 2, p. 157–173.
[22]
R. Gribonval, M. Nielsen.
Sparse Approximations in Signal and Image Processing - EDITORIAL , in: EURASIP Signal Processing, special issue on Sparse Approximations in Signal and Image Processing, March 2006, vol. 86, no 3, p. 415–416.
[23]
R. Gribonval, P. Vandergheynst.
On the exponential convergence of Matching Pursuits in quasi-incoherent dictionaries, in: IEEE Trans. Information Theory, January 2006, vol. 52, no 1, p. 255–261.
[24]
E. Kijak, G. Gravier, L. Oisel, P. Gros.
Audiovisual integration for tennis broadcast structuring, in: Multimedia Tools and Application, 2006, vol. 30, no 3, p. 289–312.
[25]
S. Krstulovic, F. Bimbot, O. Boëffard, D. Charlet, D. Fohr, O. Mella.
Optimizing the coverage of a speech database through a selection of representative speaker recordings, in: Speech Communication, October 2006, vol. 48, no 10, p. 1319-1348.
[26]
Z. Luo, M. Gaspar, J. Liu, A. Swami.
Distributed signal processing in sensor networks, in: IEEE Signal processing magazine, July 2006, vol. 23, no 4, p. 14-15.
[27]
A. Ozerov, P. Philippe, R. Gribonval, F. Bimbot.
Adaptation des modèles pour la séparation de voix chantée à partir d'un seul microphone, in: Traitement du Signal, to appear, 2006.
[28]
E. Vincent, R. Gribonval, C. Févotte.
Performance measurement in Blind Audio Source Separation, in: IEEE Trans. Speech, Audio and Language Processing, 2006, vol. 14, no 4, p. 1462–1469.

Publications in Conferences and Workshops

[29]
S. Arberet, R. Gribonval, F. Bimbot.
A Robust Method to Count and Locate Audio Sources in a Stereophonic Linear Instantaneous Mixture, in: Proc. ICA'06, Springer-Verlag LNCS series, March 2006, p. 536–543.
[30]
S. Arberet, R. Gribonval, F. Bimbot.
A Robust Method to Count and Locate Audio Sources in a Stereophonic Linear Instantaneous Mixture, in: Proc. of the Int'l. Workshop on Independent Component Analysis and Blind Signal Separation (ICA 2006), Charleston, South Carolina, USA, J. Rosca, D. Erdogmus, J. Príncipe, S. Haykin (editors), LNCS Series, Springer, March 2006, vol. 3889, p. 536–543.
[31]
E. Camberlein, P. Philippe, F. Bimbot.
Adaptive Filter Banks Using Fixed Size MDCT and Subband Merging for Audio Coding - Comparison with the MPEG AAC Filter Banks, in: 121st AES Convention, San Francisco, USA, 2006.
[32]
M. Collet, D. Charlet, F. Bimbot.
A Weighted Measure of Similarity for Speaker Tracking, in: Proc. IEEE Odyssey Workshop 2006, Puerto-Rico, USA, June 2006.
[33]
M. Collet, D. Charlet, F. Bimbot.
Représentation du locuteur par modèles d'ancrage pour l'indexation de documents audio, in: Journées d'Étude sur la Parole, Dinard, France, 2006.
[34]
M. Collet, D. Charlet, F. Bimbot.
Speaker Tracking by anchor models using speaker segment cluster information, in: Proc. Int. Conf. Acoust. Speech Signal Process. (ICASSP'06), Toulouse, France, May 2006, vol. 1, p. I-1009 – I-1012.
[35]
M. Delakis, G. Gravier, P. Gros.
Score oriented Viterbi search in sport video structuring using HMM and segment models, in: IEEE Conf. on Multimedia Signal Processing, 2006.
[36]
S. Galliano, E. Geoffrois, J.-F. Bonastre, G. Gravier, D. Mostefa, K. Choukri.
Corpus description of the ESTER Evaluation Campaign for the Rich Transcription of French Broadcast News, in: Language Resources and Evaluation Conference, 2006.
[37]
S. Huet, G. Gravier, P. Sébillot.
Are morpho-syntaxic taggers suitable to improve automatic transcription, in: Intl. Workshop on Text, Speech and Dialogue, 2006.
[38]
S. Huet, G. Gravier, P. Sébillot.
Peut-on utiliser les étiquetteurs morpho-syntaxique pour la transcription automatique?, in: Journées d'Étude sur la Parole, Dinard, France, 2006.
[39]
P. Jost, P. Vandergheynst, S. Lesage, R. Gribonval.
MoTIF : an Efficient Algorithm for Learning Translation Invariant Dictionaries, in: Int. Conf. Acoust. Speech Signal Process. (ICASSP'06), Toulouse, France, May 2006.
[40]
S. Krstulovic, R. Gribonval.
MPTK: Matching Pursuit made Tractable, in: Proc. Int. Conf. Acoust. Speech Signal Process. (ICASSP'06), Toulouse, France, May 2006, vol. 3, p. III-496 – III-499.
[41]
S. Lesage, S. Krstulovic, R. Gribonval.
Under-determined source separation: comparison of two approaches based on sparse decompositions, in: Proc. of the Int'l. Workshop on Independent Component Analysis and Blind Signal Separation (ICA 2006), Charleston, South Carolina, USA, J. Rosca, D. Erdogmus, J. Príncipe, S. Haykin (editors), LNCS Series, Springer, March 2006, vol. 3889, p. 633–640.
[42]
Y. Mami, F. Bimbot.
Etude comparative de modélisation de langage par bigrams et par multigrams pour la reconnaissance de la parole, in: Journées d'Étude sur la Parole, Dinard, France, 2006.
[43]
G. Monaci, P. Jost, P. Vandergheynst, B. Mailhe, S. Lesage, R. Gribonval.
Learning Multi-Modal Dictionaries: Application to Audiovisual Data, in: Proc. of International Workshop on Multimedia Content Representation, Classification and Security (MCRCS'06), LNCS, Springer-Verlag, September 2006, vol. 4105, p. 538–545.
[44]
G. Monaci, P. Jost, P. Vandergheynst, B. Mailhe, S. Lesage, R. Gribonval.
Learning Multi-Modal Dictionaries: Application to Audiovisual Data, in: Proc. of International Workshop on Multimedia Content Representation, Classification and Security (MCRCS'06), LNCS, Springer-Verlag, September 2006, vol. 4105, p. 538–545.
[45]
D. Moraru, G. Gravier.
Ancres macrophonétiques pour la transcription automatique, in: Journées d'Étude sur la Parole, Dinard, France, 2006.
[46]
X. Naturel, G. Gravier, P. Gros.
Fast structuring of large television streams using program guides, in: Intl. Workshop on Adaptive Multimedia Retrieval, 2006.

Internal Reports

[47]
S. Huet, G. Gravier, P. Sébillot.
Utilisation de la linguistique en reconnaissance de la parole : un état de l'art, Technical report, Irisa, May 2006, no PI 1804
http://www.irisa.fr/doccenter/publis/PI/2006/irisapublication.2006-05-30.9598024893.
[48]
E. Vincent, R. Gribonval, M. D. Plumbley.
Oracle Estimators for the Benchmarking of Source Separation Algorithms, 28 july 2006, Centre for Digital Music, Queen Mary, University of London, July 2006, no C4DM-TR-06-03.

References in notes

[49]
J. Bobin, Y. Moudden, J.-L. Starck, M. Elad.
Morphological Diversity and Source Separation, in: to appear in the IEEE Signal Processing Letters, 2006.
[50]
R. Boite, H. Bourlard, T. Dutoit, J. Hancq, H. Leich.
Traitement de la Parole, Presses Polytechniques et Universitaires Romandes, 2000.
[51]
J.-F. Bonastre, F. Bimbot, L.-J. Boë, J. Campbell, D. Reynolds, I. Magrin-Chagnolleau.
Person Authentication by Voice : A Need For Caution, in: Proc. Eurospeech'03, Genève, 2003.
[52]
M. Collet, D. Charlet, F. Bimbot.
A Correlation metric for speaker tracking using anchor models, in: Proc. IEEE-ICASSP (International Conference on Acoustics, Speech and Signal Processing), 2005, vol. I, p. 713–716.
[53]
M. Collet, Y. Mami, D. Charlet, F. Bimbot.
Probabilistic Anchor Models Approach for Speaker Verification, in: Proc. Interspeech (Eurospeech, Lisbonne), September 2005, p. 2005–2008.
[54]
M. Delakis, G. Gravier, P. Gros.
Audiovisual fusion with segment models for video structure analysis, in: 2nd European Workshop on the Integration of Knowledge, Semantic and Digital Media Technologies, 2005.
[55]
M. Delakis, G. Gravier, P. Gros.
Multimodal segmental-based modeling of tennis video broadcasts, in: Intl. Conf. on Multimedia and Exhibition, 2005.
[56]
ELDA.
ELDA - Evaluations and Language resources Distribution Agency, see http://www.elda.org/ for the specifications of the currently available SpeechDat databases, 2005
http://www.elda.org/.
[57]
C. Févotte, R. Gribonval, E. Vincent.
BSS_EVAL Toolbox User Guide – Revision 2.0, Technical report, IRISA, Rennes (France), April 2005, no 1706
http://www.irisa.fr/bibli/publi/pi/2005/1706/1706.html.
[58]
G. Gonon, F. Bimbot, et al..
Security requirements for TPD (Deliverable) – Chapter 8 : Enhanced User Authentication / Biometry for TPD, Technical report, Inspired Consortium, IST-2003-507894, June 2005, no D8.
[59]
G. Gonon, R. Gribonval, F. Bimbot.
Decision Trees with Improved Efficiency for Fast Speaker Verification, in: Proc. Interspeech'05 (Eurospeech, Lisbonne), September 2005, p. 3077–3080.
[60]
G. Gravier, F. Yvon, B. Jacob, F. Bimbot.
Sirocco, un système ouvert de reconnaissance de la parole, in: Journées d'étude sur la parole, Nancy, June 2002, p. 273-276.
[61]
R. Gribonval, E. Bacry.
Harmonic Decomposition of Audio Signals with Matching Pursuit, in: IEEE Trans. Signal Proc., jan 2003, vol. 51, no 1, p. 101–111.
[62]
R. Gribonval, L. Benaroya, E. Vincent, C. Févotte.
Proposals for Performance Measurement in Source Separation, in: Proc. 4th Int. Symp. on Independent Component Anal. and Blind Signal Separation (ICA2003), Nara, Japan, April 2003, p. 763–768.
[63]
R. Gribonval.
Fast Matching Pursuit with a multiscale dictionary of Gaussian Chirps, in: IEEE Trans. Signal Proc., May 2001, vol. 49, no 5, p. 994-1001.
[64]
R. Gribonval.
Sparse decomposition of stereo signals with Matching Pursuit and application to blind separation of more than two sources from a stereo mixture, in: Proc. Int. Conf. Acoust. Speech Signal Process. (ICASSP'02), Orlando, Florida, May 2002.
[65]
R. Gribonval.
Approximations non-linéaires pour l'analyse de signaux sonores, Ph. D. Thesis, Université Paris IX Dauphine, September 1999.
[66]
F. Jelinek.
Statistical Methods for Speech Recognition, MIT Press, Cambridge, Massachussetts, 1998.
[67]
P. Jost, P. Vandergheynst, S. Lesage, R. Gribonval.
Learning redundant dictionaries with translation invariance property : the MoTIF algorithm, in: SPARS, Rennes, 2005.
[68]
S. Krstulovic, F. Bimbot, D. Charlet, O. Boëffard.
Focal speakers : a speaker selection method able to deal with heterogeneous similarity criteria, in: Proc. Interspeech'05 (Eurospeech, Lisbonne), September 2005, p. 3057–3060.
[69]
S. Lesage, S. Krstulovic, R. Gribonval.
Séparation de sources dans le cas sous-déterminé : comparaison de deux approches basées sur des décompositions parcimonieuses, in: Proc. GRETSI, 2005.
[70]
Z. Luo, M. Gaspar, J. Liu, A. Swami.
Distributed signal processing in sensor networks, in: IEEE Signal processing magazine, July 2006, vol. 23, no 4, p. 14-15.
[71]
S. Mallat.
A Wavelet Tour of Signal Processing, 2, Academic Press, San Diego, 1999.
[72]
Y. Mami, D. Charlet.
Speaker identification by location in an optimal space of anchor models, in: ICSLP, 2002, vol. 2, p. 1333-1336.
[73]
Nagorski, Boves, Steeneken.
Optimal Selection of Speech Data for Automatic Speech Recognition Systems, in: ICSLP, 2002, p. 2473–2476.
[74]
A. Ozerov, R. Gribonval, P. Philippe, F. Bimbot.
Séparation voix / musique à partir d'enregistrements mono : quelques remarques sur le choix et l'adaptation des modèles, in: Proc. GRETSI, 2005.
[75]
A. Ozerov, P. Philippe, R. Gribonval, F. Bimbot.
One microphone singing voice separation using source-adapted model, in: Proc. WASPAA, 2005.
[76]
D. Sturim, D. Reynolds, E. Singer, J. Campbell.
Speaker indexing in large audio databases using anchor models, in: IEEE-ICASSP, 2001, p. 429–432.
[77]
E. Vincent, R. Gribonval.
Construction d'estimateurs oracles pour la séparation de sources, in: Proc. GRETSI, 2005.

previous
next