Project : acacia
Section: New Results
Keywords : Knowledge Acquisition , Knowledge Engineering , Knowledge Management , Corporate Memory , Programming Environment Knowledge Server , World Wide Web , Semantic Web , XML , RDF , Conceptual Graph , Ontology , Information Retrieval .
Information Retrieval in a Corporate Semantic Web
We study the problems involved in the dissemination of knowledge through a knowledge server via Intranet or Internet: we consider the Web, and in particular the semantic Web, as a privileged means for the assistance to management of knowledge distributed within a firm or between firms. A knowledge server allows the search for information in a heterogeneous corporate memory, this research being intelligently guided by knowledge models or ontologies. It also allows the proactive dissemination of information by intelligent agents. We look further into the case of a memory materialized in the form of a corporate semantic Web, i.e. in the form of resources (such as documents) semantically annotated by RDF statements relating to an ontology.
Corese Semantic Search Engine
Keywords : Knowledge Acquisition , Knowledge Engineering , Knowledge Management , Corporate Memory , Programming Environment Knowledge Server , Semantic Web , XML , RDF , CommonKADS , Conceptual Graph , Ontology , Information Retrieval .
Participants : Olivier Savoie, Olivier Corby [correspondant], Francis Avnaim.
The Corese ODL (Software Development Operation) aims at increasing the impact and the diffusion of Corese. In this purpose, the ACACIA team wants to develop the quality of the Corese architecture (modularity, documentation, test, evolution,...), the quality of the application programming interface (API) and the quality of the global usability of the software.
The ODL began in June 2002 for two years.
-
First of all, we worked on developments concerning the engineering of the architecture, the modularity and the usability, which lead to the current version.
-
Until summer 2002, RDF literals did not have datatypes. A specification proposal has been included in the W3C Last Call for RDF. Following this specification, we implemented a Corese datatype package. We implemented it following a precise problematic:
-
We integrated datatypes in Corese data model: we integrated the datatype hierarchy in the already existing concept type hierarchy. We integrated datatypes into graph nodes (concepts). We created classes of objects that inherit from generic datatype interfaces.
-
We manage datatypes with non null value space intersection (i.e. number, integer, float,...).
-
We can easily configure Corese by a declarative mapping between a datatype name and a Java datatype class.
-
We manage datatypes generically: based on the previous mapping, we used Java introspection to create and then consider a datatype object depending on the operation to process on it (regular expression, string operation,...). We implemented the polymorphism for datatypes.
-
To be efficient, we added several optimizations, to speed up the datatype operations.
-
-
In order to be compliant with most software architectures, Corese should be deployable as a Java Web component. We developed a separate web component using Java Serlet and JSP components. We took inspiration from several frameworks that propose a Model-View-Controller architecture. It facilitates additions of new web forms for Corese users. This component is independent of the Corese engine so as to preserve modularity.
-
-
Concerning the diffusion we increased the quality of the packaging:
-
Installation is easier (Decompression of an archive). The package contains the standalone version but the web version too.
-
A well documented installation file is delivered.
-
We added global parameters to Corese. These parameters follow the Java CLASSPATH parameter mechanism. Corese users can now parameter their own ontology, annotations location to be loaded by Corese. We implemented, following the Java model, a DataLoader class (equivalent to the Java class Loader) that is in charge of static resource loading.
-
We tested it all by helping the integration of Corese in the KMP project platform.
-
We have been working on leveraging Corese towards RDF semantics evolution, in particular, in order to take into account literals with language tag and XML Schema datatypes.
We have developed:
-
optimizations coming from constraint programming to optimize projection : enumerations, arc coherence (thanks to Gilles Trombettoni from Coprin)
-
optimization of query processor (Francis Avnaim)
-
constraint processing on the fly (Francis Avnaim).
The query processor has been extended :
An extension of the projection is proposed with path of length greater than one, bound by an integer :
x R (3) y ::= x R y OR x R t R y OR x R t1 R t2 R y.
After query, it is possible to group results as in the SQL ``group by'', and to count results :
-
group by concepts: group results that share the same binding of concepts
-
group by connected concepts: group results if there is a not null intersection in the binding. This enables to compute equivalence classes by combining a projection including an equivalence relation and a connected ``group by'' on the arguments of the equivalence relation.
-
Count the number of occurrences of different concepts after grouping, for example : group by competence and then count firms.
We have introduced some statements from OWL :
-
owl:TransitiveProperty, owl:SymmetricProperty
-
owl:inverseOf, owl:intersectionOf.
A first distribution version of Corese is available for download on the Corese web page: http://www-sop.inria.fr/acacia/soft/corese.
A first prototype of semantic web server built on top of Corese is available.
Adaptation of Corese for KMP
For the KMP project, we have designed a generic model of computation of equivalence classes among resources described in RDF. The KMP project needs to compute sets of equivalent competences which are composite objects. We define an equivalence relation, called similar. The relation extension is computed by inference rules that encode the conditions under which competences are equivalent. i.e. competences are equivalent if their components (action, environment, deliverable) are members of the same ontology subtree.
Then, we compute the equivalent competences by projection. Eventually, we compute the equivalence classes of equivalent classes with a new operator: the connected join which computes connected components of the equivalence relation.
Ontology-Guided Information Retrieval
Keywords : Conceptual Graph , XML , RDF , Semantic Web , Information Retrieval .
Participants : Carolina Medina-Ramírez, Rose Dieng-Kuntz.
This work was carried out in the framework of Carolina Medina-Ramírez's PhD [28] [29] [20].
The goal of this thesis was to give not only a translation process from languages using different semantic levels but also an environment for managing, capitalizing and distributing knowledge into an information retrieval framework.
The main contributions of this thesis concern three aspects: document semantic retrieval, documentary memory and conceptual graphs. In particular, for semantic document retrieval, we have proposed:
-
A method to translate an ontology, annotations and queries represented in a pivot language towards the conceptual graphs while passing by an intermediate translation into RDF(S). This method is formalized by a translation regular grammar: Escrire -> RDF(S) -> CG.
-
A base of inference rules for exploiting tacit knowledge underlying the Medline scientific abstracts that composed the test corpus.
For representing, handling, diffusing and querying a documentary memory, we have proposed :
-
A knowledge server called EsCorServer allowing to retrieve documents from a documentary memory of gene interactions by a sequence of operations such as the normalization of queries, the filtering of information, the inference rules and the creation of virtual documents. We have used CORESE for the information retrieval.
-
A method to create virtual documents in order to complete the results obtained from a request. This method exploits the query given by the user of EsCorServer and the annotations (possibly in various formats) available in the documentary memory.
For Conceptual graphs we have proposed :
-
Algorithms to handle disjunction and negation in conceptual graphs queries.
Software Agents for Web Mining: Application to Technological and Scientific Watch
Participants : Tuan-Dung Cao, Rose Dieng-Kuntz.
Keywords : Multi agent system, Corporate memory, semantic web, web mining, ontology, semantic annotations, technological watch.
This work was performed in the context of the thesis of Tuan-Dung Cao.
The huge amount of information now available on line and accessible through the Web can be used for the technological and scientific watch of an enterprise. For knowledge management purposes in an organization or a community (especially, for technological or strategic watch), Web mining techniques can be particularly useful for discovering relevant information in the Web.
The objective of the thesis is to exploit agent technology to develop a multiagent system, these agents being guided by ontologies, to collect, capture, filter, classify and structure the contents of the Web coming from several sources of information, in a scenario of assistance to the technology watch at the CSTB (Center of Science and Technology for Building).
Initially, to delimit and to define the problem we studied the state of the art on multi-agent system, web mining, semantic web (XML, RDF(S), ontologies). This analysis enabled us to analyze the task of monitoring which is currently carried out by the documentalists of the CSTB and to understand the current monitoring system and the monitoring process including: phases, actors, types of information sources... We tried to identify where ontologies and agents could intervene and to propose a description of this task, by relying on Lesca's monitoring model. It will help us to choose a relevant scenario of monitoring and to build ontologies, which will guide the information search and extraction.
Then, we will propose a multi-agents architecture allowing to distribute the work of Web mining between several cooperating software agents, including "wrappers" on the sources of information in order to produce semi-automatically semantic annotation bases: we will offer an extension of our previous work [25] [26]. These annotations could then be exploited by agents in semantic search as in [23].
Semantic Web Technologies for a Health Care Network
Participants : David Minier, Frédéric Corby, Rose Dieng-Kuntz, Olivier Corby, Phuc-Hiep Luong, Laurent Alamarguy.
This work was performed in the framework of the Ligne de Vie project (detailed in section 7.2) and in the framework of the trainings of Frédéric Corby, Phuc-Hiep Luong and David Minier. The ACI Ligne de Vie project objective is to develop a knowledge management system for a health care network, so as to ensure care continuity and support to collaborative work of the actors of the network.
Our contribution consisted of:
-
Translation of a medical database (Nautilus) into a structured ontology, represented in RDF(S) and enabling to browse this ontology through Corese and thus check its consistency (Frédéric Corby, Olivier Corby, David Minier) [31] [34]. This approach is interesting for a company having a database available and wishing to extract from this database the elements enabling to build a structured ontology, represented in a semantic Web standard language such as RDFS.
-
Method for enriching this medical ontology, by relying on the candidate terms extracted by a linguistic tool applied on a corpus of texts on healthcare networks (David Minier, Laurent Alamarguy, Rose Dieng-Kuntz).
-
Method for creating (possibly multi-viewpoints) annotations (David Minier, Rose Dieng-Kuntz).
We studied the various possibilities to conceptualize the notion of point of view: for the ontology's developer, for an XML document creator and for the user. We studied how to build a base of annotations on the documents associated with such a XML document: these annotations will enable for example to specify the type of a document, as well as medical user's comments, with various levels of confidentiality and possibly according to various points of focus and view angles, etc. We showed how to represent these annotations in RDF and how to use them with CORESE to get information relevant for the different categories of users. We applied this work in the framework of healthcare networks.
-
Specifications and implementation of a collaborative tool (virtual staff) (Rose Dieng-Kuntz, David Minier, Phuc-Hiep Luong, Olivier Corby).
We have proposed a first specification of a collaborative tool, called "Virtual Staff", enabling to visualize the reasoning of the actors of a health care network for complex diagnostic and therapeutic decisions. This tool will rely on conceptual graphs with certainty degrees. We studied how to express certainty degrees in fuzzy conceptual graphs and in fuzzy RDF(S). We also studied how such graphs can be represented in the SOAP model (Subjective, Objective, Assessment, Plan) used by the medical community and in the QOC model (Question, Option, Criteria) used in CSCW community. Presently, we are programming the Virtual Staff with Java language.
Fuzzy Conceptual Graphs and Fuzzy RDF(S)
Participants : Phuc-Hiep Luong, Rose Dieng-Kuntz, Olivier Corby.
This work was carried out in the context of Phuc-Hiep Luong's DEPA final internship [33].
The current World Wide Web is showing its limitations with the explosion of information over the Internet. Many Knowledge Representation formalisms have been applied to exploit contents of Web resources and better reason on them. Conceptual Graphs (CGs) and RDF(S) language have shown limitations in expressing imprecise and uncertain information. We studied several extensions of these knowledge representation formalisms with purpose of providing a flexible expressivity. With the extended Fuzzy Conceptual Graphs and Fuzzy RDF(S) obtained by combination of fuzzy concepts and fuzzy set, Web documents can be interpreted in a way similar to human expressions and arguments. With the aim of providing a flexible expressivity, we need a way of representation with a degree of certainty by fuzzy set and fuzzy logic for reasoning. Relying on this idea, we have studied Fuzzy Conceptual Graphs and proposed an extension of RDF(S) with certainty degrees. This study was realized in the framework of the ACI Ligne de Vie project (see section 7.2) that aims at the development of an online system for managing patient's healthcare documents.