Overall Objectives
Research Program
Bilateral Contracts and Grants with Industry
Partnerships and Cooperations
Bibliography
 PDF e-Pub

## Section: New Results

### Low-level content description and indexing

#### Scalability of the NV-tree: Three Experiments

Participants : Laurent Amsaleg, Björn Þór Jónsson [Univ. Copenhagen] , Herwig Lejsek [Videntifier Tech.] .

The NV-tree is a scalable approximate high-dimensional indexing method specifically designed for large-scale visual instance search. We report in [10] on three experiments designed to evaluate the performance of the NV-tree. Two of these experiments embed standard benchmarks within collections of up to 28.5 billion features, representing the largest single-server collection ever reported in the literature. The results show that indeed the NV-tree performs very well for visual instance search applications over large-scale collections.

#### Prototyping a Web-Scale Multimedia Retrieval Service Using Spark

Participants : Laurent Amsaleg, Gylfi Þór Gudmundsson [School of Computer Science, Reykjavik] , Björn Þór Jónsson [Univ. Copenhagen] , Michael Franklin [Computer Science Division, Berkeley] .

The world has experienced phenomenal growth in data production and storage in recent years, much of which has taken the form of media files. At the same time, computing power has become abundant with multi-core machines, grids, and clouds. Yet it remains a challenge to harness the available power and move toward gracefully searching and retrieving from web-scale media collections. Several researchers have experimented with using automatically distributed computing frameworks, notably Hadoop and Spark, for processing multimedia material, but mostly using small collections on small computing clusters. In [3] we describe a prototype of a (near) web-scale throughput-oriented MM retrieval service using the Spark framework running on the AWS cloud service. We present retrieval results using up to 43 billion SIFT feature vectors from the public YFCC 100M collection, making this the largest high-dimensional feature vector collection reported in the literature. We also present a publicly available demonstration retrieval system, running on our own servers, where the implementation of the Spark pipelines can be observed in practice using standard image benchmarks, and downloaded for research purposes. Finally, we describe a method to evaluate retrieval quality of the ever-growing high-dimensional index of the prototype, without actually indexing a web-scale media collection.

#### Extreme-value-theoretic estimation of local intrinsic dimensionality

Participants : Laurent Amsaleg, Teddy Furon, Oussama Chelly [National Institute of Informatics] , Stéphane Girard [MISTIS, Inria Grenoble] , Michael Houle [National Institute of Informatics] , Ken-Ichi Kawarabayashi [National Institute of Informatics] , Michael Nett [Google] .

#### Intrinsic dimensionality for Information Retrieval

Participant : Vincent Claveau.

Examining the properties of representation spaces for documents or words in Information Retrieval (IR) brings precious insights to help the retrieval process. Following the work presented in the previous paragraph, it has been shown that intrinsic dimensionality is chiefly tied with the notion of indiscriminateness among neighbors of a query point in the vector space. In this work [13], we revisit this notion in the specific case of IR. More precisely, we show how to estimate indiscriminateness from IR similarities in order to use it in representation spaces used for documents and words. We show that indiscriminateness may be used to characterize difficult queries; moreover we show that this notion, applied to word embeddings, can help to choose terms to use for query expansion.

#### Heat Map Based Feature Ranker

Participants : Christian Raymond, Carlos Huertas [Autonomous University of Baja California, Mexico] , Reyes Uarez-Ramirez [Autonomous University of Baja California, Mexico] .

In [6], we present Heat Map Based Feature Ranker, an algorithm to estimate feature importance purely based on its interaction with other variables. A compression mechanism reduces evaluation space up to 66% without compromising efficacy. Our experiments show that our proposal is very competitive against popular algorithms, producing stable results across different types of data. We also show how noise reduction through feature selection aids data visualization using emergent self-organizing maps.

#### Time series retrieval and indexing using DTW-preserving shapelets

Participants : Laurent Amsaleg, Ricardo Carlini Sperandio, Simon Malinowski, Romain Tavenard [Univ. Rennes 2] .

Dynamic Time Warping (DTW) is a very popular similarity measure used for time series classification, retrieval or clustering. DTW is, however, a costly measure, and its application on numerous and/or very long time series is difficult in practice. We have proposed a new approach for time series retrieval: time series are embedded into another space where the search procedure is less computationally demanding, while still accurate. This approach is based on transforming time series into high-dimensional vectors using DTW-preserving shapelets. That transform is such that the relative distance between the vectors in the Euclidean transformed space well reflects the corresponding DTW measurements in the original space. We have also proposed in [12] strategies for selecting a subset of shapelets in the transformed space, resulting in a trade-off between the complexity of the transformation and the accuracy of the retrieval. Experimental results using the well known time series datasets demonstrate the importance of this trade-off. This transformation can then be used to build efficient time series indexing schemes.

#### Fast Spectral Ranking for Similarity Search

Participants : Yannis Avrithis, Teddy Furon, Ahmet Iscen [Univ. Prague] , Giorgos Tolias [Univ. Prague] , Ondra Chum [Univ. Prague] .

#### Mining on Manifolds: Metric Learning without Labels

Participants : Yannis Avrithis, Ahmet Iscen [Univ. Prague] , Giorgos Tolias [Univ. Prague] , Ondra Chum [Univ. Prague] .

In this work we present a novel unsupervised framework for hard training example mining [17]. The only input to the method is a collection of images relevant to the target application and a meaningful initial representation, provided e.g. by pre-trained CNN. Positive examples are distant points on a single manifold, while negative examples are nearby points on different manifolds. Both types of examples are revealed by disagreements between Euclidean and manifold similarities. The discovered examples can be used in training with any discriminative loss. The method is applied to unsupervised fine-tuning of pre-trained networks for fine-grained classification and particular object retrieval. Our models are on par or are outperforming prior models that are fully or partially supervised.

#### Hybrid Diffusion: Spectral-Temporal Graph Filtering for Manifold Ranking

Participants : Yannis Avrithis, Teddy Furon, Ahmet Iscen [Univ. Prague] , Giorgos Tolias [Univ. Prague] , Ondra Chum [Univ. Prague] .

#### Transactional Support for Visual Instance Search

Participants : Laurent Amsaleg, Björn Þór Jónsson [Univ. Copenhagen] , Herwig Lejsek [Videntifier Tech.] .

#### Time-series prediction for capacity planning

Participants : Simon Malinowski, Colin Leverger [Orange Labs] , Thomas Guyet [AgroCampus Ouest] , Vincent Lemaire [Orange Labs] .

In a collaboration with Orange Labs, we have worked on KPI time series prediction in order to improve capacity planning. A software has been develloped. This software is detailed in [32]. It aims at visualizing and comparing different time series prediction techniques on user-defined input data. We have also developed a novel prediction algorithm that focuses on time series for with a seasonality [21]. It uses the combination of a clustering algorithm and Markov Models to produce day-ahead forecasts. Our experiments on real datasets show that in the case study, our method outperforms classical approaches (AR, Holt-Winters).

#### Scale-adaptive CNN for Crowd counting

Participants : Miaojing Shi, Lu Zhang [Fudan Univ.] , Qiaobo Chen [Shanghai Jiaotong Univ.] .

#### Revisiting Perspective information for Efficient Crowd counting

Participants : Miaojing Shi, Zhaohui Yang [Peking Univ.] , Chao Xu [Peking Univ.] , Qijun Chen [Tongji Univ.] .

#### Phone-Level Embeddings for Unit Selection Speech Synthesis

Participants : Laurent Amsaleg, Antoine Perquin [EXPRESSION team, IRISA] , Gwénolé Lecorvé [EXPRESSION team, IRISA] , Damien Lolive [EXPRESSION team, IRISA] .

Deep neural networks have become the state of the art in speech synthesis. They have been used to directly predict signal parameters or provide unsupervised speech segment descriptions through embeddings. In [25] we present four models with two of them enabling us to extract phone-level embeddings for unit selection speech synthesis. Three of the models rely on a feed-forward DNN, the last one on an LSTM. The resulting embeddings enable replacing usual expert-based target costs by an euclidean distance in the embedding space. This work is conducted on a French corpus of an 11 hours audiobook. Perceptual tests show the produced speech is preferred over a unit selection method where the target cost is defined by an expert. They also show that the embeddings are general enough to be used for different speech styles without quality loss. Furthermore, objective measures and a perceptual test on statistical parametric speech synthesis show that our models perform comparably to state-of-the-art models for parametric signal generation, in spite of necessary simplifications, namely late time integration and information compression.

#### Disfluency Insertion for Spontaneous TTS: Formalization and Proof of Concept

Participants : Pascale Sébillot, Raheel Qader [EXPRESSION team, IRISA] , Gwénolé Lecorvé [EXPRESSION team, IRISA] , Damien Lolive [EXPRESSION team, IRISA] .

#### Bi-directional Recurrent End-to-End Neural Network Classifier for Spoken Arab Digit Recognition

Participants : Christian Raymond, Naima Zerari [University of Batna 2, Algeria] , Hassen Bouzgou [University of Batna 2, Algeria] .

In [30], we propose a general end-to-end approach to sequence learning that uses Long Short-Term Memory (LSTM) to deal with the non-uniform sequence length of the speech utterances. The neural architecture can recognize the Arabic spoken digit spelling of an isolated Arabic word using a classification methodology, with the aim to enable natural human-machine interaction. The proposed system consists to, first, extract the relevant features from the input speech signal using Mel Frequency Cepstral Coefficients (MFCC) and then these features are processed by a deep neural network able to deal with the non uniformity of the sequences length. A recurrent LSTM or GRU architecture is used to encode sequences of MFCC features as a fixed size.

#### Are Deep Neural Networks good for blind image watermarking?

Participants : Teddy Furon, Vedran Vukotić [Lamark, France] , Vivien Chappelier [Lamark, France] .