Project : acacia
Section: New Results
Keywords : Corporate Memory , Cognitive Sciences , Knowledge Acquisition , Knowledge Capitalization , Knowledge Management , Knowledge Engineering , Ontology , Assistance to the User , Cognitive Psychology , Communication , Co-operation .
Support to Modelling and Building of a Corporate Memory
The objective of this action is to propose methodological and software support for the construction of a corporate memory, thanks to a user-centered approach. We study in particular the construction of a corporate semantic Web and the construction of ontologies from human and textual sources of expertise.
Methodology for Construction of a Corporate Semantic Web
Participant : Rose Dieng-Kuntz.
By taking into account the analogy between the resources of a corporate memory and the resources of the Web, a corporate memory can be materialized in a corporate semantic web, that consists of:
resources (i.e. documents in XML, HTML or non Web-oriented formats, people, services, software, materials),
ontologies (describing the conceptual vocabulary shared by the different communities of the organization),
semantic annotations on these resources (i.e. on the document contents, on persons' skills, on the characteristics of the services/software/materials), these annotations using the conceptual vocabulary defined in ontologies.
An organization can be an actual enterprise, a community, a virtual enterprise consisting of several organizations in collaboration. Our methodology for building a corporate semantic Web is summarized in .
Adapting Models from Human and Social Sciences to the Design of Organizational Memory Systems
Participant : Alain Giboin.
This action aims at adapting models from human and social sciences, esp. psychological models (cf.  ), the design of organizational memory systems, these models serving as frames to understand actual practices of organization members, to elicit system specifications, to elaborate system architectures, or to evaluate the systems and their use. Model adaptation rests on various analyzes of actual practices observed within the organizations involved in the team research contracts.
We continued our study of the dialogical approaches to organizational memory system design, by focusing on the processes of producing and understanding/using the documents which partly compose the organizational memory, e.g., instructions and procedures of work. We considered such documents as instruments for a delayed or asynchronous dialogue between present, past, and future members of an organization, a design team, and so on ; for example, a document describing a work procedure can be seen as a prop for an asynchronous dialogue between some retired employee and a new employee hired in the same departement. This view of organizational or professional documents allows to explicit the coordination processes between writers and readers/users of documents, as well as the practices (techniques, devices, etc.) used to facilitate these coordination processes. This explicitation informs system design.
Adapting the Scenario Method to the Design and Evaluation of Organizational Memory Systems
Participant : Alain Giboin.
This action aims at adapting the scenario method used by the HCI (Human-Computer Interaction) and CSCW (Computer-Supported Cooperative Work) communities to the design of organizational memory systems. The purpose of this adaptation is mainly to allow us to design user interfaces adapted to the users and to the usage context of our systems, and to make these interfaces more flexible.
We continued our study of the scenario method by reviewing variants of this method, and connected methods (vignettes, storyboards, personas, etc.), and idenfying the various forms in which the partners of a design project (user, requirements analyst, designer, developper, tester, etc.) apprehend scenarios, and how partners can pass from a form to another without losing sight of the usage aspects which motivate the design.
Construction of a multi-point of view Semantic Web
Keywords: Semantic Web, Ontology, Ontology Matching, Viewpoints.
The work is carried out within the context of Thanh-Le Bach' s PhD.
The objective of this thesis is to allow to construct and exploit a semantic web in a heterogeneous organization, comprising various sources of knowledge and various categories of users.
In the framework of knowledge management in a heterogeneous organization, the materialization of the organizational memory in a corporate semantic web may require to integrate the various ontologies of the different groups (or communities) of this organization: the various communities generally prefer to use their own ontologies instead of a common general one.
To be able to build a corporate semantic web in a heterogeneous, multi-communities organization, it is essential to have methods for manipulating the different ontologies of the various groups of the organization, for comparing, aligning, integrating or mapping these different ontologies.
We first studied the state of the art on semantic web languages such as RDF(S), DAML+OIL, OWL and on algorithms of ontology matching. We then proposed an algorithm, named ASCO, for matching two ontologies. The algorithm is based on previous work, it finds mappings in a 2-phase process: the linguistic phase and the structural phase. In the first phase, the similarity value between two elements (such as concepts or relations) from two ontologies is calculated from their available different information: their names (which defines what the concept or the relation is), their labels (which provides a human-readable version of the name of the concept or the relation), and their descriptions. The linguistic similarity value calculation is performed in several ways, such as string-distance metrics, TF/IDF. To improve the accuracy of the calculation, we integrated WordNet, a lexical reference system, to exploit the synonym relations, hypernym relations between terms. The second phase, the structural phase, exploits the taxonomic information in the structures of the ontologies. It uses the heuristic information and domain knowledge to calculate structural similarity values between elements of two ontologies. The similarity values in two phases are combined to obtain final similarity values between elements. Based on these values, mappings are deduced.
The algorithm was tested and evaluated with two real-world ontologies: O'COMMA, which has 472 concepts and 77 relations; and O'Aprobatiom, which has 460 concepts and 92 relations. O'COMMA is a corporate memory-dedicated ontology, which was developed for the CoMMA IST project (2000-2001) . O'Aprobatiom is an ontology dedicated to project memory in building domain, that was developed through a cooperation between our team ACACIA and CSTB.
As a further work, we will continue to test our algorithm on the other real-world ontologies, especially in the medical domain, and extend the algorithm to improve the results.
The mapping algorithm between ontologies can be applicable in several domains in the semantic web where the ontology is crucial and the number of ontologies is more and more abundant, where new ontology-based knowledge management systems will be created.
Ontologies and Semantic Relation Acquisition from Biomedical Corpora
This work is performed in the framework of Laurent Alamarguy's PhD thesis. It aims at elaborating methodological support and tools for the automation of corpus-based ontology construction or enrichment in order to develop a community memory in biomedical area.
Nowadays, in biomedical research area, discovering as automatically as possible some correlation between diseases and genes embodies a kind of quest for the holy grail. Indeed, we plan to use the Corese inference engine to build up accurate genetic implications in central nervous system diseases from data available through the web. This semantic engine, developed in a Semantic Web perspective, relies on ontologies and annotations on Web resources. So to enhance the automation of this knowledge acquisition, Natural Language Processing may play a major role by developing linguistic methods to analyze textual data.
First of all, we are focusing on the acquisition of semantic relations underlying gene implications in central nervous system diseases. So the first stage has been devoted to the corpus analysis and the extraction of domain expressions. We have worked on two corpora: they are respectively constituted by about 5100 Medline abstracts and about 250 Medline abstracts, and they deal with gene correlation on SNC diseases and with actors in health care networks. Moreover, we studied various extraction methods following different tools (Nomino, Fastr). We also pay attention to comparison between linguistic and statistical methods devoted to relation acquisition. While improving state of the art on biomedical ontologies and on knowledge acquisition methods, we have sketched a corpus-based ontology learning methods and we also consider reusing well-advanced tools in biomedical knowledge management such as MetaMap.
Corporate Memory and Semantic Web for the Transcriptome Analysis
This work is carried out in the context of Khaled Khelif's thesis.
The study of gene expression has been greatly facilitated by DNA micro-array technology. DNA micro-arrays measure the expression of thousands of genes simultaneously, which helps biologists to define gene functions and their effects on organisms.
The goal of this work is to assist biologists working on DNA micro-array experiments in the validation and the interpretation of their results.
Our aim is to propose a method for the capitalization and the valorization of knowledge resulting from the biologists' experiments (semantic annotations, ontology) a software architecture to preserve and reuse results of the experiments (structured documents, information retrieval). We rely on the techniques of semantic web and knowledge engineering.
Initially, in order to delimit and to define the problem, we made a state of the art on DNA micro-array experiments. Then, we focused on the approaches of knowledge acquisition from texts and we compared some statistical and linguistic tools of natural language processing (NLP) in order to propose a method for enrichment by concepts and relationships of existing bio-medical ontologies like Gene Ontology or UMLS. This new ontology will be used, for the automation of annotation of documents (papers, experiment reports) and as an input for CORESE to facilitate information retrieval.
To test different NLP tools (Nomino, Likes, Syntex), we relied on a collection of texts, chosen and manually annotated by members of the IPMC team working on micro-array experiments.