Section: Scientific Foundations
The work within the team needs two kinds of competencies: to exploit the content of documents, one should first be able to access this content, i.e. , to characterize or describe it. One should also be able to use this description in order to fulfill the tasks related to these documents. Finally, both the descriptors and exploitation techniques must satisfy the user's needs (and proving this simple fact is not so trivial).
Finding a solution requires the use of document description techniques based on text, image or video processing (sound and speech processing are studied by the METISS team with which we closely collaborate). It is also necessary to exploit the correlation and complementarity between the different media, since they do not bring the same information and do not suffer from the same limitations.
After this description stage, it is necessary to exploit the descriptions to satisfy the user's query. At this second stage, are needed sorting, indexing, retrieving algorithms which must provide good and fast results, that are two usually conflicting constraints.
These two aspects are not independent and any solution with only one of the two aspects cannot solve any real problem. The combination of the two in the context of large databases raises many difficult, but interesting, questions, and their solution only comes from a confrontation of people and ideas coming from both sides.