Overall Objectives
Scientific Foundations
Application Domains
New Results
Contracts and Grants with Industry
Other Grants and Activities

Section: Software


Participants : Moray Allan, Matthijs Douze, Matthieu Guillaumin, Hervé Jégou, Cordelia Schmid, Jakob Verbeek.

Relevant datasets are important to assess recognition methods. They allow to point out the weakness of existing methods and push forward the state-of-the-art. Datasets should capture a large variety of situations and conditions. Benchmarking procedures allow to compare the strengths of different approaches and provide clear and broadly understood performance measures.

In addition to the datasets we previously created, we released several new datasets this year as well as pre-processed image descriptors. Our publicly accessible datasets are available at .

Annotated Flickr image data set

This new data set contains Flickr images and user annotations for 20 object categories and 20 combinations of object categories. For each object category images were downloaded from the Flickr website, and ranked using our method described in [11] . For evaluation images that were top-ranked by our method have been manually annotated to indicate whether they contain the object category. In total the data set contains about 100.000 images.

Image features for image annotation data sets

A fair comparison of image annotation methods, or machine learning methods in general, requires that the same feature set is used. In this way we can separate the contributions due to good image features from those due to good learning methods. In recent work [15] we presented state-of-the-art performance on three benchmark datasets for image annotation. We released the image features computed for these datasets (45.000 images in total) to allow direct comparison with our results.

Hollywood-2 Human Actions and Scenes Dataset

The Hollywood-2 Human Actions and Scenes Dataset [21] contains 12 classes of human actions and 10 classes of scenes distributed over 3669 video clips and approximately 20.1 hours of video. The dataset provides a benchmark for human action recognition in realistic and challenging settings; it is composed of video clips from 69 movies divided into 33 training and 36 test movies. There are two training sets: an automatic and a clean one. The automatic one is obtained using automatic script-based action annotation and contains 810 video samples with approximately 60% correct labels. The clean set contains 823 video samples with manual labels. The action test set contains 884 manually annotated samples. Scene classes are selected automatically from scripts such as to maximize co-occurrence with the given action classes and to capture action context as described in [21] . Scene video samples are then generated using script-to-video alignment. The labels of test scene samples are manually verified to be correct.


Copydays is an image dataset designed to evaluate copy detection systems. It comprises a reference set of 157 personal holidays photos. These images were transformed using three kinds of artificial attacks: JPEG compression, cropping (extracting subparts) and "strong" (print and scan, paint, change in contrast, perspective effect, blur, very strong crop, etc.). This dataset was merged in [13] with a very large image set (up to 110 million images) to evaluate the behavior of different indexing schemes for copy detection on a large scale. Sample images and transformations are illustrated in Fig 2 .


Logo Inria