Overall Objectives
Scientific Foundations
Application Domains
New Results
Contracts and Grants with Industry
Other Grants and Activities
Inria / Raweb 2003
Project: TEXMEX

Project : texmex

Section: New Results

Image Retrieval in Large Databases

Our work on image description does not aim at finding new general descriptors. The IMEDIA and LEAR teams are very active in this field, and we use their results. The originality of our work comes from the size of the database we want to handle. In large databases, most images will be compressed. Is it possible to describe an image without decompressing it? In many databases, images will also be watermarked, and the influence of watermarking (and of the systems for breaking watermarks) on the content-bases description techniques is not clear. This is our first direction of research.

A second direction concerns the combination of descriptors: when documents are described by many descriptors, how a query should be processed in order to provide the fastest as possible answer? To answer this question, we study the information that each descriptor can provide about the other ones. The aim is to determine the order in which the descriptors should be considered.

The third direction is description indexing and retrieval. In the local description scheme, 1 million of images can give raise to 600 millions of descriptors, and retrieving any information in such an amount of data requires really fast access techniques, whatever the aim of this access may be.

A fourth direction is due to our collaboration with the roboticians of the VISTA team. They work on visual servoing and using a database is a good way to improve the applicability of their techniques to large displacements. Our description technique appear to be particularly well suited to such an application where a matching between images is required, and not only a global link of similarity between images.

Image Description, Compression and Watermarking

Keywords : image indexing , image description , image compression .

Participants : Patrick Gros, François Tonnin.

This is a joint work with the TEMICS team (S. Pateux).

Image authentication is becoming very important for certifying image data integrity. A key issue in image authentication is the design of a compact signature being robust under allowable manipulations. Watermarking has been mostly investigated to deal with the problem of detection of illegal copies. But it provides only an assumption, not a proof, of illegacy. We believe that content based image description techniques may provide robust detection of illegal copies. Big databases are made of compressed images. In order to speed up the matching scheme, it is of interest to calculate signatures from the compressed images. Thanks to its wavelet analysis, JPEG2000 compression standard allows the design of multiresolution signatures. Inspired by classical content based local description techniques, we have developed a robust point extractor in the wavelet space. Its average robustness is 10 % less than multiresolution Harris point extractor reference. We will investigate how to describe (in the wavelet space) the neighborhood of these points by means of vectors invariant to allowable image manipulations. Another point we consider is the comparison of robustness and speed between classical local signatures and wavelet signatures.

Combination of Descriptors by Association Rules and Multiple Correspondence Analysis

Participants : Laure Berti, Anicet Kouomou-Choupo, Annie Morin.

Content-based image retrieval is not easy when image databases become very large. Fixed image database can be described in several ways by global visual descriptors of color, texture, or form (pixel level). Most frequent queries imply and combine results of several type of descriptors such as: "retrieve all images that have similar color and similar texture to the given example image". To retrieve more efficiently and more effectively an image of a large database, we exploited combinations of descriptors. Firstly we surveyed the state of the art of image mining and content-based image retrieval. Then, our objective was to study the interest of association rules between descriptors to accelerate response time of queries on large fixed image databases. We used 5 MPEG-7 descriptors to describe several thousands of fixed images. We initially used K-means based algorithm to compute clusters of images for each descriptor. We then generated relations between different clusters in form of association rules. Multiple correspondence analysis was used to study the relevance of found associations and to validate our approach. We are now exploiting association rules between clusters of descriptors to optimize content based retrieval.

Approximate Searches: k -Neighbors + Precision

Keywords : Multidimensional Indexing Techniques , Databases , Curse of Dimensionality , Approximate Searches , Nearest-Neighbors .

Participants : Laurent Amsaleg, Sid-Ahmed Berrani, Patrick Gros.


This is a joint work with Thomson R&D France (cf. 7.1.1).

[11] [12] [18] [19] [20]

We designed an approximate search-scheme for high-dimensional DB where the precision of the search can be stochastically controlled and where the search can retrieve the k nearest-neighbors of query points. It allows a fine and intuitive control over the precision by setting at run time the maximum probability for a vector that would be in the exact answer set to be missing in the approximate answer set. This off-line scheme computes controlled approximations shrinking each cluster within which feature vectors are enclosed. Those approximations are values for (approximate) radii of clusters, and they are computed for all the levels of precision defined beforehand. To answer a query, the search process considers the appropriate approximations corresponding to the desired level of precision. This may cause the actual nearest-neighbors of the query point to be ignored. Our method, however, bounds the probability for this to happen. This paper also presents a performance study of the implementation using real datasets. It shows, for example, that our method is 6.72 times faster than the sequential scan when it handles more than 5 1 0 6 24-dimensional vectors, even when the probability of missing one of the true nearest-neighbors is below 0.01.

This approach first clusters vectors. It encloses clusters in minimum bounding hyperspheres in an Euclidean space. All existing vectors might not be in clusters because the clustering isolates outliers. Outliers are stored in a specific file that we treat separately. The clustering algorithm we use is derived from the first phase of Birch [80]. It has a couple of crucial differences, however. Birch ends its first phase when all the created micro-clusters can fit in the allowed main memory. Instead, we stop our clustering when the number of micro-clusters created falls below the maximum number of clusters that are allowed to exist. The variance of data points drives the radius of Birch' clusters. Instead, radii of clusters in our implementation are exact in the sense that each defines a minimum bounding hypersphere.

The output of the clustering phase is a set of minimum bounding hyperspheres defined by their center and their exact radius. As for Birch, clusters might overlap and outliers are treated separately. Data points are stored sequentially on disk on a per cluster basis. No specific data structure is used to index the clusters. Outliers are also stored in a separate data file, in a sequential manner.

Each cluster is analyzed off-line to derive several approximate radii given the exact radius, the volume and the distribution of vectors within each cluster. For each cluster, several approximate radii are determined, each corresponding to a predetermined level of precision. All the approximate radii of one cluster are always smaller than the exact radius of the same cluster. Approximate radii will ultimately be considered during the approximate NN-searches.

At query submission time, a user provides, along with the query, an imprecision level called α controlling the quality of the approximate NN-search. α is chosen among the set of predefined values, and it corresponds to the maximum probability for a vector that belongs to the exact answer set to be actually missing in the approximate set of answers eventually returned.

This imprecision level then determines which specific approximate radii must be taken into account by the filtering rules during the NN-search. Irrelevant clusters are thus filtered out and the remaining clusters are then ranked with respect to the distance of their centers to the query point. Clusters are then accessed one after the other. When a cluster is accessed, all the data points it contains (all points enclosed within its exact bounding hypersphere) are fetched in memory. The search then computes the distances between all points in the cluster and the query vector. This might in turn update the current set of neighbors. It might also filter out more clusters. The search stops when k neighbors have been found and when the approximate minimum distance to the next cluster is greater than the current distance to the k t h neighbor.

Before returning the result to the user, a sequential scan of the file where outliers are stored is performed. This might also update the current set of neighbors.

Coupling Action and Perception by Image Indexing and Visual Servoing

Participants : Patrick Gros, Anthony Remazeilles.

This is a joint work with the VISTA team (F. Chaumette).

We are working on automatic robot motion control, using visual information provided by an on-board camera, and an image data base of the navigation space. The image base describes the environment in which the robotic system moves. More exactly, it describes features that can observe the robot camera. Thanks to this base, the robot localization is nothing but a k nearest-neighbor search [18] of the initial image given by the camera before the motion. The localization stage therefore avoids reconstructing the entire scene, which is a time consuming and complex process.

The definition of the path the robot has to follow is also defined in terms of images : the desired position corresponds to the image the camera should obtain at the end of the motion. The same image retrieval method presented before enables to localize the desired position. By translating the image base into a valuated graph (corresponding to the feasibility to go from on image to an other), and using graph theory, the shortest image path can be easily found between the initial image and the desired one. Those images extracted from the database describe in a continuous way the space the robot has to pass through in order to reach the desired position.

During this year, we have defined a formalism that enables to control robot motions, given this image sequence, the features matched between each consecutive couple of images, and the images acquired by the camera. 3D reconstruction is not necessary yet. Furthermore, robot motion is not defined during an off-line stage; motion are determined for each image acquired by the camera. This will permit us to take easily in account within our scheme unexpected exterior events, like occlusion, obstacles, ...

Our method is based on potential field theory. The robot moves in order to make features defined on the image path, initially out of the camera field of view, become visible. Furthermore, the obtained trajectory is independent of the intermediate image positions. This work has been validated through experiments with a planar environment, and planar motions, with an articulated arm. We are trying to relax those constraints in order to be able to deal with more general motions, and on a 3D scene. Then non-holonomic constraints will be added in order to manage mobile robots in real environments.

Furthermore, we want to improve the data base management, which could accelerate the retrieval process. For example grabbing conditions (moment of the day, weather conditions, ...) are criteria that can be extracted automatically from the image signal. Those information could help to categorize the images of the base, and also to provide to the robot images that best correspond to the current exterior condition (which can be very useful as long as those images are used in the feature tracking stage). At last, a protocol for the autonomous image base acquisition should be defined, in order to be able to make experiments with the robot Cycab that owns Irisa.