Section: Contracts and Grants with Industry
Semantic networks and knowledge representation
Participants : David Auber, Ludwig Fiolka, Antoine Lambert, Frédéric Gilbert, Guy Melançon, Arnaud Sallaberry, Faraz Zaidi.
-
Project: FIVE Fouille Interactive, Veille, Visualisation et Exploration
-
Call: ANR Software Technology (RNTL)
-
start/end April 2007 – April 2009
-
Budget: 964 970 euros (total) / 390 360 euros (grant) / 25 970 euros (INRIA GRAVITÉ)
-
Project: TANGUY From Text to Arguments through Networks with Goals and User Initiative
-
Call: ANR CONTINT (Content and Interaction)
-
start/end January 2009 – December 2011
-
Budget: 872 000 euros (total) / 349 000 euros (grant) / 261 000 euros (INRIA GRAVITÉ)
Working in close collaboration with industrial partners is part of our concern. We were involved in a three year national project with two SMEs (PIKKO(See the URL http://www.pikko-software.com .) and AMI Software(See the URL http://www.amisw.com .)). The ANR FIVE project (2006-2009) focused on graph analysis and visualization, since most of the processed data can naturally be equipped with relations. We designed astute graph statistics and adaptative algorithms that can adjust with a highly changing environment. We revisited most of the existing work on text mining and document clustering, trying to exploit the scale free nature of the collected data.
The project's goal was to propose incremental statistics and adjusting visualizations to support competitive and strategic watch. Typically, analysts want to be able to identify pieces of information acting first as outliers later confirming general trends. These pieces of information are what Ansoff called weak signals [22] . We are here out of the reach of classical statistics: analysts need to inject their knowledge and intuition in the system to help judge of anecdotic situation and put pieces of information under surveillance.
Building on that past experience, we got involved in a new project with two industrial partners, Thalès Communications(See the URL http://www.thalescomminc.com ) and the Xerox XRCE Parsing and Semantics group(See the URL www.xrce.xerox.com/Research-Development/Document-Content-Laboratory/Parsing-Semantics ), and a third partner acting as “final user” – the FIDAL Law Firm.
This new ANR CONTINT project – TANGUY – aims at providing technological solutions to users confronted with ever growing amounts of information. Today the information overload is mainly handled by indexing data sources in search engines. In order to increase the relevance of the results, indexation is enriched by natural language processing (NLP) tools, especially by tools of information extraction. Another aid are clustering tools that give synthetic views of the results.
Recently, semantic web research has been using knowledge representation tools to organize information along the lines of ontologies or semantic networks. Some research has been carried out to integrate information extraction tools and knowledge representation tools. In practice, the knowledge representation approach leads to static, "engineering"-centered architecture, i.e. all the knowledge needed is fixed in advance by technical experts who are not connected to the final users. The TANGUY project tackles the problem in a quite different way. It intends to yield the users the most information and independence possible in processing their individual issues and at the same time it aims at reducing processing time and engineering costs. The TANGUY approach consists in the symmetrical and dynamic cooperation among three "poles":
-
Extraction of « micro knowledge elements » from the texts with the help of NLP tools. This extraction leads to structuring the texts into elementary pieces locally with respect to a particular document, and not specialized in a predefined domain.
-
Knowledge representation with the help of semantic networks. This pole constructs an overall vision of inter-documentary relationships in the corpus. It supports navigation, query and reasoning operators that lead the user towards his goal. It also allows the creation of a network of high-level concepts that represent the users' understanding.
-
Interaction with the semantic network. This pole gives way to exploring at various levels the continuum between the information extracted and the concepts that synthesize them. This is the tool that guides the actions (incorporation of new texts, exploration and creation of new concepts) in order that the user can construct and argument the solution of his current problem.