Team GRAVITÉ

Members
Overall Objectives
Scientific Foundations
Application Domains
Software
New Results
Contracts and Grants with Industry
Other Grants and Activities
Dissemination
Bibliography

Section: Contracts and Grants with Industry

Semantic networks and knowledge representation

Participants : David Auber, Ludwig Fiolka, Antoine Lambert, Frédéric Gilbert, Guy Melançon, Arnaud Sallaberry, Faraz Zaidi.

Working in close collaboration with industrial partners is part of our concern. We were involved in a three year national project with two SMEs (PIKKO(See the URL http://www.pikko-software.com .) and AMI Software(See the URL http://www.amisw.com .)). The ANR FIVE project (2006-2009) focused on graph analysis and visualization, since most of the processed data can naturally be equipped with relations. We designed astute graph statistics and adaptative algorithms that can adjust with a highly changing environment. We revisited most of the existing work on text mining and document clustering, trying to exploit the scale free nature of the collected data.

The project's goal was to propose incremental statistics and adjusting visualizations to support competitive and strategic watch. Typically, analysts want to be able to identify pieces of information acting first as outliers later confirming general trends. These pieces of information are what Ansoff called weak signals [22] . We are here out of the reach of classical statistics: analysts need to inject their knowledge and intuition in the system to help judge of anecdotic situation and put pieces of information under surveillance.

Building on that past experience, we got involved in a new project with two industrial partners, Thalès Communications(See the URL http://www.thalescomminc.com ) and the Xerox XRCE Parsing and Semantics group(See the URL www.xrce.xerox.com/Research-Development/Document-Content-Laboratory/Parsing-Semantics ), and a third partner acting as “final user” – the FIDAL Law Firm.

This new ANR CONTINT project – TANGUY – aims at providing technological solutions to users confronted with ever growing amounts of information. Today the information overload is mainly handled by indexing data sources in search engines. In order to increase the relevance of the results, indexation is enriched by natural language processing (NLP) tools, especially by tools of information extraction. Another aid are clustering tools that give synthetic views of the results.

Recently, semantic web research has been using knowledge representation tools to organize information along the lines of ontologies or semantic networks. Some research has been carried out to integrate information extraction tools and knowledge representation tools. In practice, the knowledge representation approach leads to static, "engineering"-centered architecture, i.e. all the knowledge needed is fixed in advance by technical experts who are not connected to the final users. The TANGUY project tackles the problem in a quite different way. It intends to yield the users the most information and independence possible in processing their individual issues and at the same time it aims at reducing processing time and engineering costs. The TANGUY approach consists in the symmetrical and dynamic cooperation among three "poles":

  1. Extraction of « micro knowledge elements » from the texts with the help of NLP tools. This extraction leads to structuring the texts into elementary pieces locally with respect to a particular document, and not specialized in a predefined domain.

  2. Knowledge representation with the help of semantic networks. This pole constructs an overall vision of inter-documentary relationships in the corpus. It supports navigation, query and reasoning operators that lead the user towards his goal. It also allows the creation of a network of high-level concepts that represent the users' understanding.

  3. Interaction with the semantic network. This pole gives way to exploring at various levels the continuum between the information extracted and the concepts that synthesize them. This is the tool that guides the actions (incorporation of new texts, exploration and creation of new concepts) in order that the user can construct and argument the solution of his current problem.


previous
next

Logo Inria