Section: New Results
Keywords : Knowledge Acquisition, Knowledge Engineering, Knowledge Management, Corporate Memory, Knowledge Server, Semantic Web, Semantic Web Server, XML, RDF, OWL, Conceptual Graph, Ontology, Information Retrieval.
Information Retrieval in a Corporate Semantic Web
We study the problems involved in the dissemination of knowledge through a knowledge server via Intranet or Internet: we consider the Web, and in particular the semantic Web, as a privileged means for the assistance to management of knowledge distributed within a firm or between firms. A knowledge server allows the search for information in a heterogeneous corporate memory, this research being intelligently guided by knowledge models or ontologies. It also allows the proactive dissemination of information by intelligent agents. We look further into the case of a memory materialized in the form of a corporate semantic Web, i.e. in the form of resources (such as documents) semantically annotated by RDF statements relating to an ontology.
Corese Semantic Search Engine and its Semantic Web Server
Keywords : Knowledge Acquisition, Knowledge Engineering, Knowledge Management, Corporate Memory, Knowledge Server, Semantic Web, XML, RDF, Conceptual Graph, Ontology, Information Retrieval.
Corese Query Language & SPARQL
Participant : Olivier Corby.
This year has been dedicated to upgrade Corese Query Language interpreter and to integrate SPARQL W3C Query Language.
We redesigned Corese interpreter in order to simplify it, to make it more uniform and more general. We extended Corese projection algorithm to n-ary relations in order to process source statement and property variable. The source statement enables to query the source (the document) where RDF triples come from. The source is denoted by a variable that can be part of the query such as variable ?src in :
?person c:hasCreated ?src
source ?src (?document c:date ?date)
filter (?date <= "2005-01-01"^^xsd:date)
Corese now processes all SPARQL (http://www.w3.org/TR/rdf-sparql-query )filter expressions such as Boolean expressions, function call, negation as failure, etc. We upgraded the graph projection algorithm with optional graph patterns. An optional graph pattern enables to return a result if it is found and does not fail if no target pattern is found. We designed an extension of the Conceptual Graph projection algorithm that authorizes optional query relations in a query graph.
We also implemented SPARQL Query Results XML Format (http://www.w3.org/TR/rdf-sparql-XMLres )that enables to deliver the variable bindings in an XML format.
Participant : Olivier Corby.
We leveraged Corese GUI Factory in order to be compatible with SPARQL. We can now build HTML forms that enable us to customize predefined SPARQL queries. The values selected in the form are used to customize variable parts of the query. We have designed a syntactic convention that is compatible with SPARQL syntax by using a specific namespace to declare variable parts that must be retrieved from the GUI. The target query can now be saved and reloaded. The GUI factory has been validated in several applications and projects (EADS, KmP, QBLS).
The Acacia team received a grant from Inria to hire an engineer to participate in the development of Corese (starting in October 2005). The engineer will start by completing the implementation of the SPARQL query language in Corese.
A new release of Corese and a new download site have been designed (http://www.inria.fr/acacia/soft/corese ).
Keywords : ontologies, semantic distance, approximate search, Multi agent system, Corporate memory, semantic web, web mining, ontology, semantic annotations, technological watch, technological monitoring.
Pursuing the industrialization of our research results on Semantic Web Server, we developed Sewese, the second version of Corese Semantic Web Server (optimized and modularized architecture) and we are testing it in the context of a contract with Philips aimed at building a pilot product for the future start-up eCore.
Semantic Distances and Clustering
Keywords : ontologies, semantic distance, approximate search.
Participant : Fabien Gandon.
This work concerns conceptual distances, semantic similarities, defining metrics over ontological spaces. In literature, the work on the formal side of the semantic web is largely influenced by the fact that logic-based languages are the most frequently used implementation formalisms. However, entailment is not the only product one should expect from a knowledge-based system, and the conceptual structures of the semantic Web can support a broad variety of inferences that goes far beyond logical deduction even in its simplest forms (RDF/S). In particular, the graph structure of the semantic web formalisms provide a space where one can define metrics, distances, similarities for instance to extend classic logical entailment in the context of information retrieval. In the domain of Conceptual Graphs, a use for such a distance is to propose a non binary projection, i.e. a similarity S: C2[0, 1] where 1 is the perfect match and 0 the absolute mismatch. In  , we prove the characteristics of the algorithm used in CORESE and in particular, we prove that in the general case, it corresponds to a semi-distance i.e. the triangle inequality does not hold for any random third type t. However, by construction, it does hold for any third type t chosen among the supertypes. This weak notion of the principle of parsimony is enough in our case as we are only interested in paths going through the supertypes.
A second experience with semantic distances was conducted in building the KmP public semantic web server. One inference implemented in this server provides a cartography of competences. To do so it exploits the graph model of the semantic web using ontology-based metrics to provide an ultra-metric used to implement the clustering algorithm grouping the competences. These results were published in ISWC  . In parallel we are conducting an early experiment to evaluate and compare these simulated metrics with the ones humans naturally use in handling information. The preliminary results were presented in  and suggest that current algorithms relying solely on the hierarchy of types to calculate and combine the similarities might require a more complex conceptual structure to become closer to human similarities.
Visualization Surrogates for Conceptual Structures
Participant : Fabien Gandon.
There is a huge gap between the conceptual structures underlying the semantic Web and the final rendering of a user-interface enabling an end-user to peruse or act on part of it. We experimented with the automation of the generation of representations for such conceptual structures. We reuse the notion of surrogate from information retrieval and we suggested a relation between these surrogates and the notion of identity conditions used in ontology engineering. From this observation we suggested and discussed a mechanism to derive maximal surrogate candidates from structures found in ontologies and rules  .
Web Mining for Technological and Scientific Watch
Keywords : Multi agent system, Corporate memory, semantic web, web mining, ontology, semantic annotations, technological watch, technological monitoring.
This work was performed in the context of the PhD of Tuan-Dung Cao.
Nowadays, relevant and updated information about technology becomes a realistic need for every corporation in a rapidly evolving business environment. Technological Watch or Technology Monitoring (TM) are activities serving the purpose of identification and assessment of technological advances critical to the company's competitive position, and of detecting changes and discontinuities in existing technologies. The information explosion on the World Wide Web makes the Web itself a mine of gold for technology monitoring. Within the framework of the knowledge management of an organization or a community, Web information extraction can be particularly useful when it is applied by a multi agent system to discover in the Web of relevant information, at ends of the technological or strategic watch.
The objective of the thesis is to use technology agents to develop a multi agent system, these agents being guided by ontologies, to collect, capture, filter, classify and structure the contents of the Web coming from several sources of information, in a scenario of assistance to the technology watch at the CSTB (Center of science and technology for Building).
Last year, our analysis of the monitoring task for the considered field (construction and building) enabled us to choose a relevant scenario of monitoring and to build an ontology which will guide the search and the extraction of information. Besides reusing the O'CoMMA ontology previously built in the CoMMA project since a part of this ontology relates to the field of building and construction, we had also to perform knowledge modeling in order to transform the vocabulary from thesaurus currently used by CSTB into an ontology. Moreover, we integrated in the ontology concepts and relations dedicated for TM task concerning TM actors, monitoring phases and the information sources and some document types.
Then we proposed an ontology-based approach for building an information system supporting technology monitoring implemented by agents. This system facilitates the document searching and annotating task of the watcher. Its agents use the ontology to enrich any watcher's query, then formulate a system query and send it to Google to search the Web and finally generate annotations from search result. Thus the watcher can easily access to information annotated by exploiting the Corese semantic search engine.
To do so, we developed and implemented three algorithms using the ontology to search the Web with Google and then generate the RDF annotations from these results of Google automatically. The two first algorithms use branches of concepts in ontology to search the Web while the third one relies on the balanced selection of descendant concepts of user's concepts in the original query  ,  . This work will be published at RFIA'2006  .
We are now designing and implementing a subsociety of ``annotator'' Agents encapsulating this algorithm, working in cooperation with other agents dedicated to other tasks in the TM system.