Team, Visitors, External Collaborators
Overall Objectives
Research Program
Highlights of the Year
New Software and Platforms
New Results
Partnerships and Cooperations
Dissemination
Bibliography
XML PDF e-pub
PDF e-Pub


Bibliography

Major publications by the team in recent years
[1]
X. Alameda-Pineda, R. Horaud.
A Geometric Approach to Sound Source Localization from Time-Delay Estimates, in: IEEE Transactions on Audio, Speech and Language Processing, June 2014, vol. 22, no 6, pp. 1082–1095. [ DOI : 10.1109/TASLP.2014.2317989 ]
https://hal.inria.fr/hal-00975293
[2]
X. Alameda-Pineda, R. Horaud.
Vision-Guided Robot Hearing, in: International Journal of Robotics Research, April 2015, vol. 34, no 4–5, pp. 437–456. [ DOI : 10.1177/0278364914548050 ]
https://hal.inria.fr/hal-00990766
[3]
X. Alameda-Pineda, E. Ricci, N. Sebe.
Multimodal behavior analysis in the wild: Advances and challenges, Academic Press (Elsevier), December 2018.
https://hal.inria.fr/hal-01858395
[4]
N. Andreff, B. Espiau, R. Horaud.
Visual Servoing from Lines, in: International Journal of Robotics Research, 2002, vol. 21, no 8, pp. 679–700.
http://hal.inria.fr/hal-00520167
[5]
S. Ba, X. Alameda-Pineda, A. Xompero, R. Horaud.
An On-line Variational Bayesian Model for Multi-Person Tracking from Cluttered Scenes, in: Computer Vision and Image Understanding, December 2016, vol. 153, pp. 64–76. [ DOI : 10.1016/j.cviu.2016.07.006 ]
https://hal.inria.fr/hal-01349763
[6]
Y. Ban, X. Alameda-Pineda, F. Badeig, S. Ba, R. Horaud.
Tracking a Varying Number of People with a Visually-Controlled Robotic Head, in: IEEE/RSJ International Conference on Intelligent Robots and Systems, Vancouver, Canada, September 2017.
https://hal.inria.fr/hal-01542987
[7]
F. Cuzzolin, D. Mateus, R. Horaud.
Robust Temporally Coherent Laplacian Protrusion Segmentation of 3D Articulated Bodies, in: International Journal of Computer Vision, March 2015, vol. 112, no 1, pp. 43–70. [ DOI : 10.1007/s11263-014-0754-0 ]
https://hal.archives-ouvertes.fr/hal-01053737
[8]
A. Deleforge, F. Forbes, R. Horaud.
Acoustic Space Learning for Sound-Source Separation and Localization on Binaural Manifolds, in: International Journal of Neural Systems, February 2015, vol. 25, no 1, 21 p. [ DOI : 10.1142/S0129065714400036 ]
https://hal.inria.fr/hal-00960796
[9]
A. Deleforge, F. Forbes, R. Horaud.
High-Dimensional Regression with Gaussian Mixtures and Partially-Latent Response Variables, in: Statistics and Computing, September 2015, vol. 25, no 5, pp. 893–911. [ DOI : 10.1007/s11222-014-9461-5 ]
https://hal.inria.fr/hal-00863468
[10]
A. Deleforge, R. Horaud, Y. Y. Schechner, L. Girin.
Co-Localization of Audio Sources in Images Using Binaural Features and Locally-Linear Regression, in: IEEE Transactions on Audio, Speech and Language Processing, April 2015, vol. 23, no 4, pp. 718–731. [ DOI : 10.1109/TASLP.2015.2405475 ]
https://hal.inria.fr/hal-01112834
[11]
V. Drouard, R. Horaud, A. Deleforge, S. Ba, G. Evangelidis.
Robust Head-Pose Estimation Based on Partially-Latent Mixture of Linear Regressions, in: IEEE Transactions on Image Processing, March 2017, vol. 26, no 3, pp. 1428–1440. [ DOI : 10.1109/TIP.2017.2654165 ]
https://hal.inria.fr/hal-01413406
[12]
G. Evangelidis, M. Hansard, R. Horaud.
Fusion of Range and Stereo Data for High-Resolution Scene-Modeling, in: IEEE Transactions on Pattern Analysis and Machine Intelligence, November 2015, vol. 37, no 11, pp. 2178–2192. [ DOI : 10.1109/TPAMI.2015.2400465 ]
https://hal.archives-ouvertes.fr/hal-01110031
[13]
G. Evangelidis, R. Horaud.
Joint Alignment of Multiple Point Sets with Batch and Incremental Expectation-Maximization, in: IEEE Transactions on Pattern Analysis and Machine Intelligence, June 2018, vol. 40, no 6, pp. 1397–1410, https://arxiv.org/abs/1609.01466. [ DOI : 10.1109/TPAMI.2017.2717829 ]
https://hal.inria.fr/hal-01413414
[14]
I. D. Gebru, X. Alameda-Pineda, F. Forbes, R. Horaud.
EM Algorithms for Weighted-Data Clustering with Application to Audio-Visual Scene Analysis, in: IEEE Transactions on Pattern Analysis and Machine Intelligence, December 2016, vol. 38, no 12, pp. 2402–2415. [ DOI : 10.1109/TPAMI.2016.2522425 ]
https://hal.inria.fr/hal-01261374
[15]
I. Gebru, S. Ba, X. Li, R. Horaud.
Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion, in: IEEE Transactions on Pattern Analysis and Machine Intelligence, July 2018, vol. 40, no 5, pp. 1086–1099, https://arxiv.org/abs/1603.09725. [ DOI : 10.1109/TPAMI.2017.2648793 ]
https://hal.inria.fr/hal-01413403
[16]
M. Hansard, G. Evangelidis, Q. Pelorson, R. Horaud.
Cross-Calibration of Time-of-flight and Colour Cameras, in: Computer Vision and Image Understanding, April 2015, vol. 134, pp. 105–115. [ DOI : 10.1016/j.cviu.2014.09.001 ]
https://hal.inria.fr/hal-01059891
[17]
M. Hansard, R. Horaud, M. Amat, G. Evangelidis.
Automatic Detection of Calibration Grids in Time-of-Flight Images, in: Computer Vision and Image Understanding, April 2014, vol. 121, pp. 108–118. [ DOI : 10.1016/j.cviu.2014.01.007 ]
https://hal.inria.fr/hal-00936333
[18]
M. Hansard, R. Horaud.
Cyclopean geometry of binocular vision, in: Journal of the Optical Society of America A, September 2008, vol. 25, no 9, pp. 2357–2369. [ DOI : 10.1364/JOSAA.25.002357 ]
http://hal.inria.fr/inria-00435548
[19]
M. Hansard, R. Horaud.
Cyclorotation Models for Eyes and Cameras, in: IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, March 2010, vol. 40, no 1, pp. 151–161. [ DOI : 10.1109/TSMCB.2009.2024211 ]
http://hal.inria.fr/inria-00435549
[20]
M. Hansard, R. Horaud.
A Differential Model of the Complex Cell, in: Neural Computation, September 2011, vol. 23, no 9, pp. 2324–2357. [ DOI : 10.1162/NECO_a_00163 ]
http://hal.inria.fr/inria-00590266
[21]
M. Hansard, S. Lee, O. Choi, R. Horaud.
Time of Flight Cameras: Principles, Methods, and Applications, Springer Briefs in Computer Science, Springer, October 2012, 95 p.
http://hal.inria.fr/hal-00725654
[22]
R. Horaud, G. Csurka, D. Demirdjian.
Stereo Calibration from Rigid Motions, in: IEEE Transactions on Pattern Analysis and Machine Intelligence, December 2000, vol. 22, no 12, pp. 1446–1452. [ DOI : 10.1109/34.895977 ]
http://hal.inria.fr/inria-00590127
[23]
R. Horaud, F. Forbes, M. Yguel, G. Dewaele, J. Zhang.
Rigid and Articulated Point Registration with Expectation Conditional Maximization, in: IEEE Transactions on Pattern Analysis and Machine Intelligence, March 2011, vol. 33, no 3, pp. 587–602. [ DOI : 10.1109/TPAMI.2010.94 ]
http://hal.inria.fr/inria-00590265
[24]
R. Horaud, M. Niskanen, G. Dewaele, E. Boyer.
Human Motion Tracking by Registering an Articulated Surface to 3-D Points and Normals, in: IEEE Transactions on Pattern Analysis and Machine Intelligence, January 2009, vol. 31, no 1, pp. 158–163. [ DOI : 10.1109/TPAMI.2008.108 ]
http://hal.inria.fr/inria-00446898
[25]
V. Khalidov, F. Forbes, R. Horaud.
Conjugate Mixture Models for Clustering Multimodal Data, in: Neural Computation, February 2011, vol. 23, no 2, pp. 517–557. [ DOI : 10.1162/NECO_a_00074 ]
http://hal.inria.fr/inria-00590267
[26]
D. Knossow, R. Ronfard, R. Horaud.
Human Motion Tracking with a Kinematic Parameterization of Extremal Contours, in: International Journal of Computer Vision, September 2008, vol. 79, no 3, pp. 247–269. [ DOI : 10.1007/s11263-007-0116-2 ]
http://hal.inria.fr/inria-00590247
[27]
D. Kounades-Bastian, L. Girin, X. Alameda-Pineda, S. Gannot, R. Horaud.
A Variational EM Algorithm for the Separation of Time-Varying Convolutive Audio Mixtures, in: IEEE/ACM Transactions on Audio, Speech and Language Processing, August 2016, vol. 24, no 8, pp. 1408–1423. [ DOI : 10.1109/TASLP.2016.2554286 ]
https://hal.inria.fr/hal-01301762
[28]
X. Li, L. Girin, F. Badeig, R. Horaud.
Reverberant Sound Localization with a Robot Head Based on Direct-Path Relative Transfer Function, in: IEEE/RSJ International Conference on Intelligent Robots and Systems, Daejeon, South Korea, IEEE, October 2016, pp. 2819–2826. [ DOI : 10.1109/IROS.2016.7759437 ]
https://hal.inria.fr/hal-01349771
[29]
X. Li, L. Girin, R. Horaud, S. Gannot.
Estimation of the Direct-Path Relative Transfer Function for Supervised Sound-Source Localization, in: IEEE/ACM Transactions on Audio, Speech and Language Processing, November 2016, vol. 24, no 11, pp. 2171–2186. [ DOI : 10.1109/TASLP.2016.2598319 ]
https://hal.inria.fr/hal-01349691
[30]
X. Li, L. Girin, R. Horaud, S. Gannot.
Multiple-Speaker Localization Based on Direct-Path Features and Likelihood Maximization with Spatial Sparsity Regularization, in: IEEE/ACM Transactions on Audio, Speech and Language Processing, October 2017, vol. 25, no 10, pp. 1997–2012, 16 pages, 4 figures, 4 tables. [ DOI : 10.1109/TASLP.2017.2740001 ]
https://hal.inria.fr/hal-01413417
[31]
B. Massé, S. Ba, R. Horaud.
Tracking Gaze and Visual Focus of Attention of People Involved in Social Interaction, in: IEEE Transactions on Pattern Analysis and Machine Intelligence, November 2018, vol. 40, no 11, pp. 2711–2724, https://arxiv.org/abs/1703.04727. [ DOI : 10.1109/TPAMI.2017.2782819 ]
https://hal.inria.fr/hal-01511414
[32]
M. Sapienza, M. Hansard, R. Horaud.
Real-time Visuomotor Update of an Active Binocular Head, in: Autonomous Robots, January 2013, vol. 34, no 1, pp. 33–45. [ DOI : 10.1007/s10514-012-9311-2 ]
http://hal.inria.fr/hal-00768615
[33]
A. Zaharescu, E. Boyer, R. Horaud.
Topology-Adaptive Mesh Deformation for Surface Evolution, Morphing, and Multi-View Reconstruction, in: IEEE Transactions on Pattern Analysis and Machine Intelligence, April 2011, vol. 33, no 4, pp. 823–837. [ DOI : 10.1109/TPAMI.2010.116 ]
http://hal.inria.fr/inria-00590271
[34]
A. Zaharescu, E. Boyer, R. Horaud.
Keypoints and Local Descriptors of Scalar Functions on 2D Manifolds, in: International Journal of Computer Vision, October 2012, vol. 100, no 1, pp. 78–98. [ DOI : 10.1007/s11263-012-0528-5 ]
http://hal.inria.fr/hal-00699620
[35]
A. Zaharescu, R. Horaud.
Robust Factorization Methods Using A Gaussian/Uniform Mixture Model, in: International Journal of Computer Vision, March 2009, vol. 81, no 3, pp. 240–258. [ DOI : 10.1007/s11263-008-0169-x ]
http://hal.inria.fr/inria-00446987
Publications of the year

Doctoral Dissertations and Habilitation Theses

[36]
Y. Ban.
Audio-visual multiple-speaker tracking for robot perception, Université Grenoble Alpes, May 2019.
https://tel.archives-ouvertes.fr/tel-02163418

Articles in International Peer-Reviewed Journals

[37]
Y. Ban, X. Alameda-Pineda, C. Evers, R. Horaud.
Tracking Multiple Audio Sources with the Von Mises Distribution and Variational EM, in: IEEE Signal Processing Letters, June 2019, vol. 26, no 6, pp. 798 - 802, https://arxiv.org/abs/1812.08246. [ DOI : 10.1109/LSP.2019.2908376 ]
https://hal.inria.fr/hal-01969050
[38]
Y. Ban, X. Alameda-Pineda, L. Girin, R. Horaud.
Variational Bayesian Inference for Audio-Visual Tracking of Multiple Speakers, in: IEEE Transactions on Pattern Analysis and Machine Intelligence, November 2019, vol. 42, pp. 1-17, https://arxiv.org/abs/1809.10961. [ DOI : 10.1109/TPAMI.2019.2953020 ]
https://hal.inria.fr/hal-01950866
[39]
S. Lathuilière, B. Massé, P. Mesejo, R. Horaud.
Neural Network Based Reinforcement Learning for Audio-Visual Gaze Control in Human-Robot Interaction, in: Pattern Recognition Letters, February 2019, vol. 118, pp. 61-71, https://arxiv.org/abs/1711.06834. [ DOI : 10.1016/j.patrec.2018.05.023 ]
https://hal.inria.fr/hal-01643775
[40]
S. Lathuilière, P. Mesejo, X. Alameda-Pineda, R. Horaud.
A Comprehensive Analysis of Deep Regression, in: IEEE Transactions on Pattern Analysis and Machine Intelligence, April 2019, vol. 41, pp. 1-17, https://arxiv.org/abs/1803.08450. [ DOI : 10.1109/TPAMI.2019.2910523 ]
https://hal.inria.fr/hal-01754839
[41]
X. Li, Y. Ban, L. Girin, X. Alameda-Pineda, R. Horaud.
Online Localization and Tracking of Multiple Moving Speakers in Reverberant Environments, in: IEEE Journal of Selected Topics in Signal Processing, March 2019, vol. 13, no 1, pp. 88-103, https://arxiv.org/abs/1809.10936. [ DOI : 10.1109/JSTSP.2019.2903472 ]
https://hal.inria.fr/hal-01851985
[42]
X. Li, L. Girin, S. Gannot, R. Horaud.
Multichannel Online Dereverberation based on Spectral Magnitude Inverse Filtering, in: IEEE/ACM Transactions on Audio, Speech and Language Processing, May 2019, vol. 27, no 9, pp. 1365-1377, https://arxiv.org/abs/1812.08471. [ DOI : 10.1109/TASLP.2019.2919183 ]
https://hal.inria.fr/hal-01969041
[43]
X. Li, L. Girin, S. Gannot, R. Horaud.
Multichannel Speech Separation and Enhancement Using the Convolutive Transfer Function, in: IEEE/ACM Transactions on Audio, Speech and Language Processing, March 2019, vol. 27, no 3, pp. 645-659, https://arxiv.org/abs/1711.07911. [ DOI : 10.1109/TASLP.2019.2892412 ]
https://hal.inria.fr/hal-01799809
[44]
X. Li, L. Girin, R. Horaud.
Expectation-Maximization for Speech Source Separation using Convolutive Transfer Function, in: CAAI Transactions on Intelligent Technologies, March 2019, vol. 4, no 1, pp. 47 - 53. [ DOI : 10.1049/trit.2018.1061 ]
https://hal.inria.fr/hal-01982250
[45]
X. Li, S. Leglaive, L. Girin, R. Horaud.
Audio-noise Power Spectral Density Estimation Using Long Short-term Memory, in: IEEE Signal Processing Letters, June 2019, vol. 26, no 6, pp. 918-922, https://arxiv.org/abs/1904.05166. [ DOI : 10.1109/LSP.2019.2911879 ]
https://hal.inria.fr/hal-02100059

Invited Conferences

[46]
F. Forbes, A. Deleforge, R. Horaud, E. Perthame.
Robust non-linear regression approach for generalized inverse problems in a high dimensional setting, in: AIP 2019 - Applied Inverse Problem conference, Grenoble, France, July 2019.
https://hal.archives-ouvertes.fr/hal-02415115

International Conferences with Proceedings

[47]
X. Alameda-Pineda, S. Arias, Y. Ban, G. Delorme, L. Girin, R. Horaud, X. Li, B. Mourgue, G. Sarrazin.
Audio-Visual Variational Fusion for Multi-Person Tracking with Robots, in: ACMMM 2019 - 27th ACM International Conference on Multimedia, Nice, France, ACM Press, October 2019, pp. 1059-1061. [ DOI : 10.1145/3343031.3350590 ]
https://hal.inria.fr/hal-02354514
[48]
L. Girin, F. Roche, T. Hueber, S. Leglaive.
Notes on the use of variational autoencoders for speech and audio spectrogram modeling, in: DAFx 2019 - 22nd International Conference on Digital Audio Effects, Birmingham, United Kingdom, February 2019, pp. 1-8.
https://hal.archives-ouvertes.fr/hal-02349385
[49]
S. Leglaive, L. Girin, R. Horaud.
Semi-supervised multichannel speech enhancement with variational autoencoders and non-negative matrix factorization, in: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, United Kingdom, IEEE, May 2019, pp. 101-105. [ DOI : 10.1109/ICASSP.2019.8683704 ]
https://hal.inria.fr/hal-02005102
[50]
S. Leglaive, U. Simsekli, A. Liutkus, L. Girin, R. Horaud.
Speech enhancement with variational autoencoders and alpha-stable distributions, in: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing, Brighton, United Kingdom, IEEE, 2019, pp. 541-545, https://arxiv.org/abs/1902.03926. [ DOI : 10.1109/ICASSP.2019.8682546 ]
https://hal.inria.fr/hal-02005106
[51]
X. Li, R. Horaud.
Multichannel Speech Enhancement Based on Time-frequency Masking Using Subband Long Short-Term Memory, in: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, United States, October 2019, pp. 1-5.
https://hal.inria.fr/hal-02264247
[52]
B. Massé, S. Lathuilière, P. Mesejo, R. Horaud.
Extended Gaze Following: Detecting Objects in Videos Beyond the Camera Field of View, in: FG 2019 - 14th IEEE International Conference on Automatic Face and Gesture Recognition, Lille, France, IEEE, May 2019, pp. 1-8. [ DOI : 10.1109/FG.2019.8756555 ]
https://hal.inria.fr/hal-02054236

Other Publications

[53]
S. Leglaive, X. Alameda-Pineda, L. Girin, R. Horaud.
A Recurrent Variational Autoencoder for Speech Enhancement, October 2019, https://arxiv.org/abs/1910.10942 - working paper or preprint.
https://hal.archives-ouvertes.fr/hal-02329000
[54]
X. Li, R. Horaud.
Narrow-band Deep Filtering for Multichannel Speech Enhancement, November 2019, https://arxiv.org/abs/1911.10791 - working paper or preprint.
https://hal.inria.fr/hal-02378413
[55]
M. Sadeghi, S. Leglaive, X. Alameda-Pineda, L. Girin, R. Horaud.
Audio-visual Speech Enhancement Using Conditional Variational Auto-Encoder, November 2019, https://arxiv.org/abs/1908.02590 - Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing.
https://hal.inria.fr/hal-02364900