Team Orpailleur

Members
Overall Objectives
Scientific Foundations
Application Domains
Software
New Results
Other Grants and Activities
Dissemination
Bibliography

Section: New Results

KDDK and Text Mining

Participants : Rokia Bendoaud, Amedeo Napoli, Emmanuel Nauer, Yannick Toussaint, Jean Villerd.

The objective of text mining is to extract useful and reusable knowledge units from large collections of texts. An objective of the team is to make available extracted knowledge units for allowing a “machine-based” manipulation of texts.

Knowledge discovery from heterogeneous textual resources

Ontologies help software and human agents to communicate by providing shared and common domain knowledge, and by supporting various tasks, e.g. problem-solving and information retrieval. In practice, building an ontology depends on a number of “ontological resources” having different types: thesaurus, dictionaries, texts, databases, and ontologies themselves. We are currently working on the design of a methodology for ontology engineering from heterogeneous ontological resources. A methodology and a system, called “Pactole”, have been designed and have been applied in various contexts, namely in astronomy and in biology [13] . The “Pactole” methodology extends previous research works based on FCA and aimed at building ontologies from ontological resources using formal concept analysis and relational concept analysis.

The “Pactole” methodology is based on the identification in texts of objects, and on the extraction of object properties and of relations between objects. Object identification is possible thanks to a list of names (for example the celestial object “HR2725” or the bacteria “Echerichia Coli”) or a set of patterns (“NGC xxxx” where “xxxx” is a number). Properties and relations between objects are extracted from the texts using syntactic parsers (e.g. Stanford parser) and information extraction tools (e.g. Gate). Properties are expressed in texts with adjectives or verbs while relations are usually expressed through lexical patterns.

Then, binary tables “Objects × Attributes” are built and the associated concept lattices can be computed. In addition, a transformation function may convert the lattice into a concept hierarchy expressed in a simple description logic formalism (FLE). The RCA process has been used to take into account relations between objects and to create relation between concepts of the ontology. Moreover, an interactive process based on FCA and RCA including the analyst into the KDD loop when building an ontology has been studied.

Meanwhile, beside ontology engineering, a survey on the use of association rules for text mining, mainly for classifying extracted association rules from texts, has been published, giving a conclusion to this research aspect in the team [62] .

KDDK in Pharmacovigilance

Participants : Yannick Toussaint, Jean Villerd.

Pharmacovigilance (PV) holds on the study and the prevention of adverse reactions to drugs (ADR), based on data collected by specialized centers and stored in case report databases (CRDBs). The CRDBs are then mined for finding unexpected associations between drugs and ADR that can be interpreted as signals. A safety signal appears when a single drug consumption is the cause of an (unexpected) ADR. A syndrome appears when a single drug consumption is the cause of several (unexpected) ADRs. A drug interaction appears when the consumption of several drugs is the cause of an (unexpected) ADR. A protocol appears when the consumption of several drugs is the cause of several (unexpected) ADRs.

The ANR Project Vigitermes was running its second year in 2009. The primary goal of this project is to design a knowledge-based system for the management and the documentation of case reports, and, as well, for the detection of unexpected pharmacological associations.

We developed first an approach based on association rules [44] . However, trying to establish a better formulation of expert needs led us to propose a new method for identifying candidates for pharmacological associations to be investigated in clinical trials. A clinical trial allows the observation of a drug activity on a given population. The identification method relies on Formal Concept Analysis. The lattice resulting from FCA is used as a “search space” for searching patterns in itemsets associated to concepts in the lattice. The subsumption relation between concepts in the lattice is used to relate signals, interactions, and protocols (as introduced above). In addition, this identification method uses several statistical components for numerically filtering significant associations. The method has been implemented within a prototype system and validated through an experiment on a data base from the “Georges Pompidou Hospital”.


previous
next

Logo Inria