Project : acacia
Section: New Results
Keywords : Knowledge Acquisition, Knowledge Engineering, Knowledge Management, Corporate Memory, Knowledge Server, Semantic Web, XML, RDF, OWL, Conceptual Graph, Ontology, Information Retrieval.
Information Retrieval in a Corporate Semantic Web
We study the problems involved in the dissemination of knowledge through a knowledge server via Intranet or Internet: we consider the Web, and in particular the semantic Web, as a privileged means for the assistance to management of knowledge distributed within a firm or between firms. A knowledge server allows the search for information in a heterogeneous corporate memory, this research being intelligently guided by knowledge models or ontologies. It also allows the proactive dissemination of information by intelligent agents. We look further into the case of a memory materialized in the form of a corporate semantic Web, i.e. in the form of resources (such as documents) semantically annotated by RDF statements relating to an ontology.
Corese Semantic Search Engine
Keywords : Knowledge Acquisition, Knowledge Engineering, Knowledge Management, Corporate Memory, Knowledge Server, Semantic Web, XML, RDF, Conceptual Graph, Ontology, Information Retrieval.
Corese software development operation
The Corese ODL software development operation finished in June 2004. We designed and developped a semantic web server architecture to embed Corese. The server is based on Tomcat, servlets and JSP. We have designed several functions that enable a smooth integration of Corese RDF processing into standard web technology such as XSLT, JSP and Java TagLib.
Graphic User Interface
We designed an XML GUI meta language that enable to describe ontology based graphic user interface. Graphic widgets are built by queries to the semantic server. The query retrieves ontology and/or metadata elements and builds graphic widgets such as a selector. The meta description of the GUI is translated, by XSLT, into HTML/JSP in order to be rendered by a navigator.
RDF Query Language
We designed and developped a new query language based on RDF triples. The language has a select where format and it happens to be compatible with the W3C SPARQL proposition.
Our language includes boolean expressions with and/or connectors, the test of non existence of arcs and optional arcs. Variables can match property as well as resources. The query language also processes a subset of XML Schema datatypes. The query language is translated into query graphs and processed by conceptual graph projection.
RDF Rule Language
Corese RDF Rule Language takes into account the new query language syntax. Rules are now written as RDF query triples. Furthermore, the rule language has been extended in order to enable the creation of blank nodes in the RDF graph.
We continued the extension of Corese to OWL Lite restrictions on properties (owl:someValuesFrom and owl:allValuesFrom). The restrictions are translated into Corese RDF Rules.
We participated in the installation of Corese and helped the Galaad team at INRIA Sophia Antipolis.
Semantic distances and clustering
Keywords : ontologies, semantic distance, approximate search.
Participant : Fabien Gandon.
Most of the conceptual structures used in knowledge-based systems essentially rely on a logical formalization of the knowledge. However focusing on the logical implications lead knowledge-based systems to ignore some characteristics of the conceptual structures of people. One of the things that graph-based formalisms underline is an isomorphism between graph-distances or geometric distances in the representation and natural conceptual distances between the notions they represent and articulate. In other words, two notions geometrically close in the graphical representation are supposed to be intuitively close in the mind of the modelers. This closeness is a characteristic that can be exploited, for instance, to improve information retrieval in the form of constraints relaxation to closest notions. To do so we are studying algorithms to simulate conceptual distances using the ontological tree and we are applying it in particular to approximated search and result clustering.
Visualization surrogates for conceptual structures
Participant : Fabien Gandon.
Here we address a problem faced in many projects: the generation of semiotic representations for conceptual structures such as the annotations, and query results on the semantic Web. Drawing on the parallel between the patterns of such surrogates and the notion of identity conditions, we proposed and explained a mechanism exploiting the semantic Web frameworks to automate the generation of templates for these surrogates. We showed how these templates improve representation, for instance when viewing the results of a query. The approach focused on generating templates providing the properties to include in a surrogate, regardless of the way it is rendered (text, graphics, speech, etc.).
Our goal was to detect a maximum of these properties that were potentially interesting, then fine tuning can take place. Our approach and implementation relied on rules because the Corese platform of the ACACIA team is based on conceptual graphs and graph rules. In other platforms offering other formalization means or insights in the ontology engineering process, other sources than rules could be exploited to derive surrogate properties from identity conditions. Our point here is that the semantic web will have to be dynamic and will use: the users' profile, the context and history of interactions, semiotic modeling primitives added to our meta-model, signs linked to the primitives of our ontologies, logics of semiotics and surrogate generation, in addition to the conceptual structures to be communicated to the users.
Software Agents for Web Mining: Application to Technological and Scientific Watch
Keywords : Multi agent system, Corporate memory, semantic web, web mining, ontology, semantic annotations, technological watch, technological monitoring.
This work was performed in the context of the thesis of Tuan-Dung Cao.
Technological Watch or Technology Monitoring is now recognized as a crucial activity for achieving and maintaining competitive positions in a rapidly evolving business environment. It serves the purpose of identification and assessment of technological advances critical to the company's competitive position, and of detecting changes and discontinuities in existing technologies. The rise of Internet supported the appearance of much information available on line, potentially useful for the technological and scientific survey of an enterprise. Within the framework of knowledge management of an organization or a community, the Web mining can be particularly useful when it is applied by a multi agent system to discover in the Web of relevant information, at ends of the technological or strategic watch.
The objective of the thesis is to exploit technology agents to develop a multiagent system, these agents being guided by ontologies, to collect, capture, filter, classify and structure the contents of the Web coming from several sources of information, in a scenario of support to technology watch at the CSTB (French Scientific and Technical Center for Building).
First of all, we analysed the task of monitoring for the field considered (construction and building) which is carried out at CSTB to choose a relevant scenario of monitoring and to build an ontology which will guide the search and the extraction of information. On the one hand, this ontology inherits the vocabulary in the ontology O'CoMMA which was developed for the CoMMA European IST project (2000-2001). On the other hand we added concepts and relations concerning not only the field of Construction but also the actors, tasks, and information sources in the technological monitoring process too.
After identifying the important roles of ontology in each phase in technological monitoring process, we proposed an ontology based approach for building an information system supporting technology monitoring implemented by agents. One of the most important work in this system is to find out useful resources on the Web, and then annotate them using the ontology so that user can retrieve them easily through the semantic search engine Corese.
To do so, we proposed an algorithm using ontology to search the Web with Google and then generate automatically the RDF annotations from these results of Google . The algorithm has been implemented and is currently in the phase of test. As further work, we will continue to test our algorithm and extend it to improve the results. In parallel, we will design and implement a subsociety of annotator agents encapsulating this algorithm, working in cooperation with other agents dedicated to other tasks in the technological monitoring system.