Team, Visitors, External Collaborators
Overall Objectives
Research Program
Application Domains
Highlights of the Year
New Software and Platforms
New Results
Bilateral Contracts and Grants with Industry
Partnerships and Cooperations
Dissemination
Bibliography
XML PDF e-pub
PDF e-Pub


Bibliography

Publications of the year

Doctoral Dissertations and Habilitation Theses

[1]
M. Fontaine.
Alpha-stable processes for signal processing, Université de Lorraine, June 2019.
https://tel.archives-ouvertes.fr/tel-02188304
[2]
L. Perotin.
Localization and enhancement of speech from the Ambisonics format : analyse de scènes sonores pour faciliter la commande vocale, Université de Lorraine, October 2019.
https://hal.univ-lorraine.fr/tel-02393258
[3]
A. Tsukanova.
Articulatory speech synthesis, Univeristé de lorraine, December 2019.
https://hal.archives-ouvertes.fr/tel-02433528

Articles in International Peer-Reviewed Journals

[4]
N. Bertin, E. Camberlein, R. Lebarbenchon, E. Vincent, S. Sivasankaran, I. Illina, F. Bimbot.
VoiceHome-2, an extended corpus for multichannel speech processing in real homes, in: Speech Communication, January 2019, vol. 106, pp. 68-78. [ DOI : 10.1016/j.specom.2018.11.002 ]
https://hal.inria.fr/hal-01923108
[5]
A. Deleforge, D. Di Carlo, M. Strauss, R. Serizel, L. Marcenaro.
Audio-Based Search and Rescue with a Drone: Highlights from the IEEE Signal Processing Cup 2019 Student Competition, in: IEEE Signal Processing Magazine, September 2019, vol. 36, no 5, pp. 138-144, https://arxiv.org/abs/1907.04655. [ DOI : 10.1109/MSP.2019.2924687 ]
https://hal.archives-ouvertes.fr/hal-02161897
[6]
K. Déguernel, E. Vincent, J. Nika, G. Assayag, K. Smaïli.
Learning of Hierarchical Temporal Structures for Guided Improvisation, in: Computer Music Journal, 2019, vol. 43, no 2.
https://hal.inria.fr/hal-02378273
[7]
A. Mesaros, A. Diment, B. Elizalde, T. Heittola, E. Vincent, B. Raj, T. Virtanen.
Sound event detection in the DCASE 2017 Challenge, in: IEEE/ACM Transactions on Audio, Speech and Language Processing, June 2019, vol. 27, no 6, pp. 992-1006. [ DOI : 10.1109/TASLP.2019.2907016 ]
https://hal.inria.fr/hal-02067935
[8]
Q. V. Nguyen, F. Colas, E. Vincent, F. Charpillet.
Motion planning for robot audition, in: Autonomous Robots, December 2019, vol. 43, no 8, pp. 2293-2317. [ DOI : 10.1007/s10514-019-09880-1 ]
https://hal.inria.fr/hal-02188342
[9]
L. Perotin, R. Serizel, E. Vincent, A. Guérin.
CRNN-based multiple DoA estimation using acoustic intensity features for Ambisonics recordings, in: IEEE Journal of Selected Topics in Signal Processing, February 2019, vol. 13, no 1, pp. 22-33. [ DOI : 10.1109/jstsp.2019.2900164 ]
https://hal.inria.fr/hal-01839883
[10]
A. Poddar, M. Sahidullah, G. Saha.
Quality Measures for Speaker Verification with Short Utterances, in: Digital Signal Processing, January 2019, vol. 88, pp. 66-79. [ DOI : 10.1016/j.dsp.2019.01.023 ]
https://hal.inria.fr/hal-01998376
[11]
K. Smaïli, D. Fohr, C.-E. González-Gallardo, M. L. Grega, L. Janowski, D. Jouvet, A. Koźbiał, D. Langlois, M. Leszczuk, O. Mella, M.-A. Menacer, A. Mendez, E. L. L. Pontes, E. Sanjuan, J.-M. Torres-Moreno, B. Garcia-Zapirain.
Summarizing videos into a target language: Methodology, architectures and evaluation, in: Journal of Intelligent and Fuzzy Systems, July 2019, vol. 1, pp. 1-12. [ DOI : 10.3233/JIFS-179350 ]
https://hal.archives-ouvertes.fr/hal-02271287
[12]
V. Vestman, T. Kinnunen, R. G. Hautamäki, M. Sahidullah.
Voice Mimicry Attacks Assisted by Automatic Speaker Verification, in: Computer Speech and Language, June 2019, vol. 59, pp. 36-54. [ DOI : 10.1016/j.csl.2019.05.005 ]
https://hal.archives-ouvertes.fr/hal-02161773

Invited Conferences

[13]
C. Dodane, D. Boutet, F. Hirsch, S. Ouni, A. Morgenstern.
MODALISA une plateforme intégrative pour capturer l’orchestration des gestes et de la parole, in: Défi Instrumentation aux Limites, Colloque de restitution, Paris, France, CNRS, September 2019.
https://hal.archives-ouvertes.fr/hal-02375011
[14]
F. Forbes, A. Deleforge, R. Horaud, E. Perthame.
Robust non-linear regression approach for generalized inverse problems in a high dimensional setting, in: AIP 2019 - Applied Inverse Problem conference, Grenoble, France, July 2019.
https://hal.archives-ouvertes.fr/hal-02415115
[15]
D. Jouvet.
Speech Processing and Prosody, in: TSD 2019 - 22nd International Conference of Text, Speech and Dialogue, Ljubljana, Slovenia, September 2019.
https://hal.inria.fr/hal-02177210
[16]
R. Serizel, N. Turpault.
Sound Event Detection from Partially Annotated Data: Trends and Challenges, in: IcETRAN conference, Srebrno Jezero, Serbia, June 2019.
https://hal.inria.fr/hal-02114652
[17]
E. Vincent.
COMPRISE, in: META-FORUM, Bruxelles, Belgium, October 2019.
https://hal.inria.fr/hal-02377051
[18]
E. Vincent.
Grands défis scientifiques et technologiques en traitement de la parole: quelles initiatives chez Inria et au niveau européen?, in: Voice Tech Paris 2019, Paris, France, November 2019.
https://hal.inria.fr/hal-02377036
[19]
E. Vincent.
Parole & deep learning : succès et grands défis, in: Journée IA, Langage et Citoyens, Nancy, France, March 2019.
https://hal.inria.fr/hal-02090623

International Conferences with Proceedings

[20]
K. Abidi, D. Fohr, D. Jouvet, D. Langlois, O. Mella, K. Smaïli.
A Fine-grained Multilingual Analysis Based on the Appraisal Theory: Application to Arabic and English Videos, in: ICALP: International Conference on Arabic Language Processing, Nancy, France, Springer, August 2019, vol. Communications in Computer and Information Science book series (CCIS, volume 1108), pp. 49-61. [ DOI : 10.1007/978-3-030-32959-4_4 ]
https://hal.archives-ouvertes.fr/hal-02314244
[21]
T. Biasutto–Lervat, S. Dahmani, S. Ouni.
Modeling Labial Coarticulation with Bidirectional Gated Recurrent Networks and Transfer Learning, in: INTERSPEECH 2019 - 20th Annual Conference of the International Speech Communication Association, Graz, Austria, September 2019.
https://hal.inria.fr/hal-02175780
[22]
A. Bonneau.
German obstruent sequences by French L2 learners, in: ICPhS 2019 - International Congress of Phonetic Sciences, Melbourne, Australia, August 2019.
https://hal.inria.fr/hal-02143360
[23]
S. Dahmani, V. Colotte, V. Girard, S. Ouni.
Conditional Variational Auto-Encoder for Text-Driven Expressive AudioVisual Speech Synthesis, in: INTERSPEECH 2019 - 20th Annual Conference of the International Speech Communication Association, Graz, Austria, September 2019.
https://hal.inria.fr/hal-02175776
[24]
D. Di Carlo, A. Deleforge, N. Bertin.
Mirage: 2D Source Localization Using Microphone Pair Augmentation with Echoes, in: ICASSP 2019 - IEEE International Conference on Acoustic, Speech Signal Processing, Brighton, United Kingdom, IEEE, May 2019, pp. 775-779, https://arxiv.org/abs/1906.08968. [ DOI : 10.1109/ICASSP.2019.8683534 ]
https://hal.archives-ouvertes.fr/hal-02160940
[25]
C. Dodane, D. Boutet, I. Didirkova, F. Hirsch, S. Ouni, A. Morgenstern.
An integrative platform to capture the orchestration of gesture and speech, in: GeSpIn 2019 - Gesture and Speech in Interaction, Paderborn, Germany, September 2019.
https://hal.inria.fr/hal-02278345
[26]
I. K. Douros, J. Felblinger, J. Frahm, K. Isaieva, A. Joseph, Y. Laprie, F. Odille, A. Tsukanova, D. Voit, P.-A. Vuissoz.
A Multimodal Real-Time MRI Articulatory Corpus of French for Speech Research, in: INTERSPEECH 2019 - 20th Annual Conference of the International Speech Communication Association, Graz, Austria, September 2019.
https://hal.inria.fr/hal-02167756
[27]
I. K. Douros, Y. Laprie, P.-A. Vuissoz, B. Elie.
Acoustic Evaluation of Simplifying Hypotheses Used in Articulatory Synthesis, in: ICA 2019 - 23rd International Congress on Acoustics, Aachen, Germany, September 2019.
https://hal.inria.fr/hal-02180617
[28]
I. K. Douros, A. Tsukanova, K. Isaieva, P.-A. Vuissoz, Y. Laprie.
Towards a method of dynamic vocal tract shapes generation by combining static 3D and dynamic 2D MRI speech data, in: INTERSPEECH 2019 - 20th Annual Conference of the International Speech Communication Association, Graz, Austria, September 2019.
https://hal.inria.fr/hal-02181333
[29]
I. K. Douros, P.-A. Vuissoz, Y. Laprie.
Acoustic impacts of geometric approximation at the level of velum and epiglottis on French vowels, in: ICPhS 2019 - International Congress of Phonetic Sciences, Melbourne, Australia, August 2019.
https://hal.inria.fr/hal-02180566
[30]
I. K. Douros, P.-A. Vuissoz, Y. Laprie.
Comparison between 2D and 3D models for speech production: a study of French vowels, in: ICPhS 2019 - International Congress of Phonetic Sciences, Melbourne, Australia, August 2019.
https://hal.inria.fr/hal-02180606
[31]
I. K. Douros, P.-A. Vuissoz, Y. Laprie.
Effect of head posture on phonation of French vowels, in: ICPhS 2019 - Proceedings of International Congress of Phonetic Sciences, Melbourne, Australia, August 2019.
https://hal.inria.fr/hal-02180486
[32]
A. Dufraux, E. Vincent, A. Hannun, A. Brun, M. Douze.
Lead2Gold: Towards exploiting the full potential of noisy transcriptions for speech recognition, in: ASRU 2019 - IEEE Automatic Speech Recognition and Understanding Workshop, Singapour, Singapore, December 2019.
https://hal.inria.fr/hal-02316572
[33]
B. Elie, A. Amelot, Y. Laprie, S. Maeda.
Glottal Opening Measurements in VCV and VCCV Sequences, in: ICA 2019 - 23rd International Congress on Acoustics, Aachen, Germany, September 2019.
https://hal.inria.fr/hal-02180626
[34]
M. Fontaine, A. A. Nugraha, R. Badeau, K. Yoshii, A. Liutkus.
Cauchy Multichannel Speech Enhancement with a Deep Speech Prior, in: EUSIPCO 2019 - 27th European Signal Processing Conference, Coruña, Spain, September 2019.
https://hal.telecom-paristech.fr/hal-02288063
[35]
T. Kinnunen, R. G. Hautamäki, V. Vestman, M. Sahidullah.
Can We Use Speaker Recognition Technology to Attack Itself? Enhancing Mimicry Attacks Using Automatic Target Speaker Selection, in: ICASSP 2019 – 44th International Conference on Acoustics, Speech, and Signal Processing, Brighton, United Kingdom, May 2019.
https://hal.inria.fr/hal-02051701
[36]
A. Kulkarni, V. Colotte, D. Jouvet.
Layer adaptation for transfer of expressivity in speech synthesis, in: LTC'19 - 9th Language & Technology Conference, Poznan, Poland, May 2019.
https://hal.inria.fr/hal-02177945
[37]
L. Lee, K. Bartkova, D. Jouvet, M. Dargnat, Y. Keromnes.
Can prosody meet pragmatics? Case of discourse particles in French, in: ICPhS 2019 - International Congress of Phonetic Sciences, Melbourne, Australia, August 2019.
https://hal.inria.fr/hal-02177202
[38]
K. A. Lee, V. Hautamäki, T. Kinnunen, H. Yamamoto, K. Okabe, V. Vestman, J. Huang, G. Ding, H. Sun, A. Larcher, R. K. Das, H. Li, M. Rouvier, P.-M. B. Bousquet, W. Rao, Q. Wang, C. Zhang, F. Bahmaninezhad, H. Delgado, J. Patino, Q. Wang, L. Guo, T. Koshinaka, J. Zhang, K. Shinoda, T. Ngo Trong, M. Sahidullah, F. Lu, Y. Tang, M. Tu, K. Kuan Teh, H. Dat Tran, K. K. George, I. Kukanov, F. Desnous, J. Yang, E. Yılmaz, L. Xu, J.-F. Bonastre, C. Xu, Z. H. Lim, S. Chng, S. Ranjan, J. H. L. Hansen, M. Todisco, N. Evans.
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences, in: INTERSPEECH 2019 - 20th Annual Conference of the International Speech Communication Association, Graz, Austria, September 2019.
https://hal.archives-ouvertes.fr/hal-02280151
[39]
T. Léonova, G. Coffe, A. Tarasconi, A. Piquard-Kipffer, D. Sardin, A. Gosse, J. Boré.
L'impact du trouble du spectre de l'autisme sur le bien-être psychologique des parents, in: XVIIIème Congrès de l'Association Internationale de Formation et de Recherche en Éducation Familiale, Schoelcher, Martinique, France, May 2019.
https://hal.inria.fr/hal-02179616
[40]
M. A. Menacer, C. E. González-Gallardo, K. Abidi, D. Fohr, D. Jouvet, D. Langlois, O. Mella, F. Sadat, J. M. Torres-Moreno, K. Smaïli.
Extractive Text-Based Summarization of Arabic videos: Issues, Approaches and Evaluations, in: ICALP: International Conference on Arabic Language Processing, Nancy, France, Springer, August 2019, vol. Communications in Computer and Information Science book series (CCIS, volume 1108), pp. 65-78. [ DOI : 10.1007/978-3-030-32959-4_5 ]
https://hal.archives-ouvertes.fr/hal-02314238
[41]
M. Menacer, D. Langlois, D. Jouvet, D. Fohr, O. Mella, K. Smaïli.
Machine Translation on a parallel Code-Switched Corpus, in: Canadian AI 2019 - 32nd Conference on Canadian Artificial Intelligence, Ontario, Canada, Lecture Notes in Artificial Intelligence, May 2019.
https://hal.archives-ouvertes.fr/hal-02106010
[42]
M. Pariente, A. Deleforge, E. Vincent.
A Statistically Principled and Computationally Efficient Approach to Speech Enhancement using Variational Autoencoders, in: INTERSPEECH, Graz, Austria, September 2019, https://arxiv.org/abs/1905.01209.
https://hal.inria.fr/hal-02116165
[43]
Best Paper
L. Perotin, A. Défossez, E. Vincent, R. Serizel, A. Guérin.
Regression versus classification for neural network based audio source localization, in: WASPAA 2019 - IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, United States, IEEE, October 2019.
https://hal.inria.fr/hal-02125985
[44]
D. Ribas, E. Vincent.
An improved uncertainty propagation method for robust i-vector based speaker recognition, in: ICASSP 2019 - 44th International Conference on Acoustics, Speech, and Signal Processing, Brighton, United Kingdom, May 2019, https://arxiv.org/abs/1902.05761.
https://hal.inria.fr/hal-02010199
[45]
B. M. L. Srivastava, A. Bellet, M. Tommasi, E. Vincent.
Privacy-Preserving Adversarial Representation Learning in ASR: Reality or Illusion?, in: INTERSPEECH 2019 - 20th Annual Conference of the International Speech Communication Association, Graz, Austria, September 2019.
https://hal.inria.fr/hal-02166434
[46]
M. Todisco, X. Wang, V. Vestman, M. Sahidullah, H. Delgado, A. Nautsch, J. Yamagishi, N. Evans, T. Kinnunen, K. A. Lee.
ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection, in: INTERSPEECH 2019 - 20th Annual Conference of the International Speech Communication Association, Graz, Austria, September 2019.
https://hal.archives-ouvertes.fr/hal-02172099
[47]
A. Tsukanova, I. K. Douros, A. Shimorina, Y. Laprie.
Can static vocal tract positions represent articulatory targets in continuous speech? Matching static MRI captures against real-time MRI for the French language, in: ICPhS 2019 - International Congress of Phonetic Sciences, Melbourne, Australia, August 2019.
https://hal.inria.fr/hal-02181314
[48]
N. Turpault, R. Serizel, A. Parag Shah, J. Salamon.
Sound event detection in domestic environments with weakly labeled data and soundscape synthesis, in: Workshop on Detection and Classification of Acoustic Scenes and Events, New York City, United States, October 2019.
https://hal.inria.fr/hal-02160855
[49]
N. Turpault, R. Serizel, E. Vincent.
Semi-supervised triplet loss based learning of ambient audio embeddings, in: ICASSP, Brighton, United Kingdom, 2019.
https://hal.archives-ouvertes.fr/hal-02025824
[50]
I. Zangar, Z. Mnasri, V. Colotte, D. Jouvet.
F0 modeling using DNN for Arabic parametric speech synthesis, in: INNSBDDL 2019 - INNS Big Data and Deep Learning, Sestri Levante, Italy, April 2019.
https://hal.inria.fr/hal-02177496

Scientific Books (or Scientific Book chapters)

[51]
M. Sahidullah, H. Delgado, M. Todisco, T. Kinnunen, N. Evans, J. Yamagishi, K. A. Lee.
Introduction to Voice Presentation Attack Detection and Recent Advances, in: Handbook of Biometric Anti-Spoofing: Presentation Attack Detection, S. Marcel, M. S. Nixon, J. Fierrez, N. Evans (editors), Advances in Computer Vision and Pattern Recognition, Springer, 2019, pp. 321-361. [ DOI : 10.1007/978-3-319-92627-8_15 ]
https://hal.inria.fr/hal-01974528

Internal Reports

[52]
B. Caramiaux, F. Lotte, J. Geurts, G. Amato, M. Behrmann, F. Bimbot, F. Falchi, A. Garcia, J. Gibert, G. Gravier, H. Holken, H. Koenitz, S. Lefebvre, A. Liutkus, A. Perkis, R. Redondo, E. Turrin, T. Viéville, E. Vincent.
AI in the media and creative industries, New European Media (NEM), April 2019, pp. 1-35, https://arxiv.org/abs/1905.04175.
https://hal.inria.fr/hal-02125504
[53]
G. Carbajal, R. Serizel, E. Vincent, E. Humbert.
Joint DNN-Based Multichannel Reduction of Acoustic Echo, Reverberation and Noise: Supporting Document, Inria Nancy, équipe Multispeech ; Invoxia SAS, November 2019, no RR-9303.
https://hal.inria.fr/hal-02372431
[54]
K. A. Lee, V. Hautamäki, T. Kinnunen, H. Yamamoto, K. Okabe, V. Vestman, J. Huang, G. Ding, H. Sun, A. Larcher, R. K. Das, H. Li, M. Rouvier, P.-M. B. Bousquet, W. Rao, Q. Wang, C. Zhang, F. Bahmaninezhad, H. Delgado, J. Patino, Q. Wang, L. Guo, T. Koshinaka, J. Zhang, K. Shinoda, T. Ngo Trong, M. Sahidullah, F. Lu, Y. Tang, M. Tu, K. Kuan Teh, H. Dat Tran, K. K. George, I. Kukanov, F. Desnous, J. Yang, E. Yılmaz, L. Xu, J.-F. Bonastre, C. Xu, Z. H. Lim, S. Chng, S. Ranjan, J. H. L. Hansen, M. Todisco, N. Evans.
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences, I4U Consortium, April 2019.
https://hal.archives-ouvertes.fr/hal-02174317
[55]
M. Pariente, A. Deleforge, E. Vincent.
A Statistically Principled and Computationally Efficient Approach to Speech Enhancement using Variational Autoencoders : Supporting Document, Inria, April 2019, no RR-9268, pp. 1-8.
https://hal.inria.fr/hal-02089062

Software

[56]
M. Kowalski, E. Vincent, R. Gribonval.
Underdetermined Reverberant Source Separation, October 2019,
[ SWH-ID : swh:1:dir:ec4ae097465d9ea51589537ea94b2ea50e8d134d ]
, Software.
https://hal.archives-ouvertes.fr/hal-02309043

Other Publications

[57]
G. Carbajal, R. Serizel, E. Vincent, E. Humbert.
Joint DNN-Based Multichannel Reduction of Acoustic Echo, Reverberation and Noise, December 2019, working paper or preprint.
https://hal.inria.fr/hal-02372579
[58]
N. Furnon, R. Serizel, I. Illina, S. Essid.
DNN-Based Distributed Multichannel Mask Estimation for Speech Enhancement in Microphone Arrays, October 2019, Submitted to ICASSP2020.
https://hal.archives-ouvertes.fr/hal-02389159
[59]
M. Pariente, S. Cornell, A. Deleforge, E. Vincent.
Filterbank design for end-to-end speech separation, October 2019, Submitted to ICASSP2020.
https://hal.archives-ouvertes.fr/hal-02355623
[60]
M. Sahidullah, J. Patino, S. Cornell, R. Yin, S. Sivasankaran, H. Bredin, P. Korshunov, A. Brutti, R. Serizel, E. Vincent, N. Evans, S. Marcel, S. Squartini, C. Barras.
The Speed Submission to DIHARD II: Contributions & Lessons Learned, November 2019, working paper or preprint.
https://hal.inria.fr/hal-02352840
[61]
R. Serizel, N. Turpault, A. Shah, J. Salamon.
Sound event detection in synthetic domestic environments, November 2019, working paper or preprint.
https://hal.inria.fr/hal-02355573
[62]
S. Sivasankaran, E. Vincent, D. Fohr.
Analyzing the impact of speaker localization errors on speech separation for automatic speech recognition, November 2019, Submitted to ICASSP 2020.
https://hal.inria.fr/hal-02355669
[63]
S. Sivasankaran, E. Vincent, D. Fohr.
SLOGD: Speaker Location Guided Deflation Approach to Speech Separation, November 2019, Submitted to ICASSP 2020.
https://hal.inria.fr/hal-02355613
[64]
B. M. L. Srivastava, N. Vauquier, M. Sahidullah, A. Bellet, M. Tommasi, E. Vincent.
Evaluating Voice Conversion-based Privacy Protection against Informed Attackers, November 2019, working paper or preprint.
https://hal.inria.fr/hal-02355115