Overall Objectives
Scientific Foundations
Application Domains
New Results
Contracts and Grants with Industry
Other Grants and Activities

Section: New Results

Large-scale image search

Burstiness of visual elements

Participants : Matthijs Douze, Hervé Jégou, Cordelia Schmid.

Burstiness, a phenomenon initially observed in text retrieval, is the property that a given element appears more times in a document than a statistically independent model would predict. We have shown that burstiness translates to visual words in images [18] , see Figure 1 for an illustration. One can observe that many detected regions are assigned to the same visual word. The examples include man-made objects such as buildings, church windows and playing cards as well as textures such as a brick wall and corals. In both cases the repetitiveness stems from the scene property, for example the windows of the buildings are very similar and the bricks are repeated.

In the context of image search, burstiness corrupts the visual similarity measure, i.e., the scores used to rank the images. We, therefore, proposed a strategy to handle visual bursts for bag-of-features based image search systems. Experimental results on three reference datasets show that handling burstiness with the proposed method significantly and consistently outperforms the state of the art.

Figure 1. Illustration of burstiness. Features assigned to the most “bursty” visual word of each image are displayed.

Compact representation of bag-of-features

Participants : Matthijs Douze, Hervé Jégou, Cordelia Schmid.

One of the main limitations of image search based on bag-of-features is the memory usage per image. Only a few million images can be accessed on a single machine in quasi real-time. In [19] , [26] we first evaluated how the memory usage is reduced by using lossless index compression. We then proposed an approximate representation of bag-of-features obtained by projecting the corresponding histogram onto a set of pre-defined sparse projection functions, producing several image descriptors. Coupled with a appropriate indexing structure, an image is represented by a few hundred bytes. A distance expectation criterion is then used for ranking images. Our method is at least one order of magnitude faster than standard bag-of-features while providing excellent search quality.

Approximate nearest neighbor search with quantization

Participants : Matthijs Douze, Hervé Jégou, Harsimrat Sandhawalia, Cordelia Schmid.

We have proposed two approaches for nearest neighbor search in the presence of severe memory constraints. The key idea is to see the problem of search as a distance estimation problem. Our first approach [22] mimics a source coding approach, and formulates the problem of generating compact signature as a rate-distortion problem. In the spirit of source coding algorithms, we aim at minimizing the reconstruction error on the squared distances with a constraint on the memory usage. The vectors are ranked based on their distance expectations to the query vector.

The idea is pushed further in [29] , where vector quantization based on a product quantizer is used to obtain a distance estimation. The method is advantageously used in an asymmetric manner, by computing the distance between a vector and code. This is in contrast to competing techniques such as spectral hashing that only compare codes. The method is shown to outperform two state-of-the-art approaches of the literature. Timings measured when searching a vector set of 2 billion vectors are shown to be excellent given the high accuracy of the method.

Evaluation of GIST descriptors for web-scale image search

Participants : Laurent Amsaleg [ CNRS - IRISA ] , Matthijs Douze, Hervé Jégou, Harsimrat Sandhawalia, Cordelia Schmid.

We have evaluated the search accuracy and complexity of the global GIST descriptor [13] for two applications, where a local description is generally preferred: same location/object recognition and copy detection. We also proposed an indexing strategy for global descriptors that optimizes the trade-off between memory usage and precision. Our scheme provides a reasonable accuracy in some widespread application cases together with very high efficiency: In our experiments, querying an image database of 110 million images takes 0.18 second per image on a single machine. For common copyright attacks, this efficiency is obtained without noticeably sacrificing the search accuracy compared with state-of-the-art approaches. See Figure 2 for example queries and search results.

Figure 2. We search images in a 110 million-image dataset using the GIST descriptor and our large-scale indexing approach. Query images (right) are degraded more or less severely. The numbers indicate the rank of the original image (left) in the resulting response list. Results are excellent for JPEG3 and CROP20 and very good for CROP50. For “strong” transformations two out of three examples were not found (plain circle).

Aggregating local descriptors into a compact image representation

Participants : Matthijs Douze, Hervé Jégou, Patrick Pérez [ INRIA Rennes ] , Cordelia Schmid.

We address the problem of image search on a very large scale, where three constraints have to be considered jointly: the accuracy of the search, its efficiency, and the memory usage of the representation. To address this problem, we first propose a simplification of the Fischer Kernel image representation, which is a way of aggregating local image descriptors into a vector of limited dimension. We then present an approach for coding and indexing such vectors that preserves well the accuracy of the vectorial Euclidean comparison. The evaluation shows that our approach significantly outperforms the state-of-the-art: the search accuracy is comparable to the bag-of-features approach for an image representation requiring 20 bytes of memory.


Logo Inria