Section: Overall Objectives
The explosion of the quantity of numerical documents raises the problem of the management of these documents. Beyond the storage, we are interested in the problems linked to the management of the contents: how to exploit the large databases of documents, how to classify documents, how to index them in order to search efficiently their contents, how to visualize their contents? To tackle these problems, we propose a multi-field work gathering within the same team specialists of the various media: image, video, text, and specialists in data and related metadata exploitation techniques such as the database techniques, statistics, and information retrieval. Our work is at the intersection of these fields and relates more particularly to 3 points: i) searching in large image databases, ii) adding semantics to search engines, and iii) coupling media for multimedia document description.
Exploiting the content of large databases of digital multimedia documents is a problem with multiple facets. Moreover, the construction of a system exploiting such databases calls upon many techniques: study and description of documents, organization of the bases, search algorithms, classification, visualization, but also needs an efficient management of the primary and secondary memories, as well as nice interfaces and interactions with the user.
The five major challenges of the field TexMex aims at tackling are the following ones:
it is necessary, first of all, to be able to process large sets of documents: it is important to develop techniques that scale up gracefully with respect to the quantity of documents taken into account (millions of images, months of videos), and to evaluate their results in quality as well as in speed;
multimedia documents are not a simple juxtaposition of independent media, and it is important to better exploit the existing links between the various media composing a unique document;
multimedia document databases are evolutionary: it is important to take into account that the sets of documents evolve, as do the document description techniques and the modes of questioning, which modifies in turn the way the databases are used;
towards queries of a semantic nature for their majority, description techniques have only access to the document syntax; it is thus necessary to find means for reducing this difference between semantic needs and syntactic description tools;
the user-system interaction is a central point: the user must be able to translate his/her needs efficiently and simply but very precisely, to guide the system or to evaluate the results; he/she must be the one who controls the system.
We have adopted a matricial organization for laying out our research. On the one hand, we have expertise in two main fields, automatic document description and exploitation of these descriptions, and on the other hand, we defined three transverse axis of research. The underlying idea is to focus our work on the questions where the team's multidisciplinarity appears to be an asset to obtain original results.
- Our First Field of Competence: Document Description
Documents are generally not exploitable directly for search or indexing tasks: it is necessary to use intermediate descriptions which must carry the maximum of information on document semantics, but must also be automatically computable. To the documents and their descriptors, one can add metadata, which we define here as all additional information which inform, supplement or qualify the data with which they are associated.
- Our Second Field of Competence: Description Exploitation
The question is to define the techniques which make it possible to apprehend, handle and exploit large volumes of data, metadata and descriptors, which have been extracted from the documents: i) organization and management of the multimedia databases, including the control of logical and temporal consistency, strategies of computation and selection of descriptors and metadata; ii) statistical techniques for the exploration of large volumes of data; iii) indexing techniques aiming at confining in the smallest possible area the exploitation of the data and thus avoiding an exhaustive processing whose cost is certainly controlled but prohibitive; iv) system problems related to the physical organization of large volumes of data, like disk access management or cache memory management requiring new techniques which are adapted to the characteristics of the descriptors and to the way they are used.
- First Axis of Research: Searching in Large Image Bases
Going from corpora of a few thousands of images to corpora containing a few millions remains a research challenge today. The solution can neither solely come from new description schemes nor new indexing schemes, but it requires to take into account all the various components of the system and their articulations. Thus, we work on:
data description, especially in the case of compressed or watermarked images,
indexing and search algorithms,
database organization and use of the metadata,
system and hardware support,
and on the merging of these various techniques to improve the performances of the current systems in speed as well as in quality of recognition.
- Second Axis of Research: Towards More Semantic Search Engines
Search engines are extensively used tools, but they appear to be disappointing most of the time, due to their syntactic approach based on keywords searching. Natural language processing (NLP) tools could however offer more semantic capabilities, by allowing word sense disambiguation and the possibility to recognize the various formulations of a same concept. It is thus advisable to merge NLP and traditional keyword-based approaches.
However, this union is not so simple. On the one hand, it requires to provide query and document extension strategies to search engines and then to translate these extensions in terms of similarity. On the other hand, natural language processing tools must work in much broader environments than the ones in which they are usually used. The contribution of such a modification of the engines must also be established, which requires a precise work on the evaluation of information retrieval systems.
- Third Axis of Research: Multimedia and Cross-Media
We study media coupling along three directions. Within the framework of video, we are interested in descriptions which jointly use the sound and image tracks of the video. Such techniques can be applied to automatic video structuring, but also to improve people detection and recognition techniques, whether it is by their face or their voice. Another interesting direction consists of using NLP techniques on texts produced by speech transcriptions. As a matter of fact, speech carries a lot of semantic information and NLP techniques are among the most efficient ones for extracting semantics from textual data.
In addition, we study the interactions between text and image in the documents where these two media are tightly coupled, a common case in scientific bibliographical databases, on the web, in newspapers, in art books or technical documents. One goal is to connect, in the same document, the image and the text associated with images. This could help in obtaining an automatic and semantic description of the images, to link different documents, either by searching for visually similar images, or by searching for texts about a same subject, and thus to improve the description of the images and to remove possible ambiguities in the comprehension of the text.
Moreover, we have also begun the study of the interactions between speech and text together with the METISS Team. This work aims at adapting and inserting methods existing in the text analysis domain into speech recognition models to improve their performances in order to give indexing methods a better access to information such speeches may contain.