Elker (2017–2020)

Participants : Miguel Couceiro, Esther Catherine Galbrun, Amedeo Napoli, Chedy Raïssi.

The objectives of the new ELKER ANR Research Project is to study, formalize and implement the search for link keys in RDF data. Link keys generalize database keys in two independent directions, i.e. they deal with RDF data and they apply across two datasets. The goal of ELKER is to study the automatic discovery of link keys and reasoning with link keys, especially in taking an FCA point of view. One main idea is to rely on the competencies of Orpailleur in FCA for solving the problem using FCA and pattern structures algorithms, especially those related to the discovery of functional dependencies. This project involves the EPI Orpailleur at Inria Nancy Grand Est, the EPI MOEX at Inria Rhône Alpes, and LIASD at Université Paris 8.

ISTEX (2014–2017)

Participant : Yannick Toussaint.

ISTEX is a so-called “Initiative d'excellence” managed by CNRS and DIST (“Direction de l'Information Scientifique et Technique”). ISTEX aims at providing the research and teaching community an on-line access to scientific publications in all domains ( In this way, ISTEX requires a massive acquisition of documents such as journals, proceedings, corpora, and databases. The Orpailleur team was especially involved in the development of facilities for querying full-text documentation, analyzing content and extracting information. The project was carried out in collaboration with the ATILF laboratory and the INIST Institute (both located in Nancy).

PractiKPharma (2016–2020)

Participants : Adrien Coulet, Joël Legrand, Pierre Monnin, Amedeo Napoli, Malika Smaïl-Tabbone, Yannick Toussaint.

PractiKPharma for “Practice-based evidences for actioning Knowledge in Pharmacogenomics” is an ANR research project ( about the validation of domain knowledge in pharmacogenomics. Pharmacogenomics is interested in understanding how genomic variations related to patients have an impact on drug responses. Most of the available knowledge in pharmacogenomics (state of the art) lies in biomedical literature, with various levels of validation. An originality of PractiKPharma is to use Electronic Health Records (EHRs) to constitute cohorts of patients. These cohorts are then mined for extracting potential pharmacogenomics patterns to be then validated w.r.t. literature knowledge for becoming actionable knowledge units. More precisely, firstly we should extract pharmacogenomic patterns from the literature and secondly we should confirm or moderate the interpretation and validation of these units by mining EHRs. Comparing knowledge patterns extracted from the literature with facts extracted from EHRs is a complex task depending on the EHR language –literature is in English whereas EHRs are in French– and on knowledge level, as EHRs represent observations at the patient level whereas literature is related to sets of patients. The PractiKPharma involves three other laboratories, namely LIRMM in Montpellier, SSPIM in St-Etienne and CRC in Paris.

CNRS PEPS and Mastodons projects

Mastodons Projects: from HyQual to HyQualiBio (2016–2018)

Participants : Miguel Couceiro, Esther Catherine Galbrun, Tatiana Makhalova, Amedeo Napoli, Chedy Raïssi, Justine Reynaud.

The HyQual project was proposed in 2016 in response to the Mastodons CNRS Call about data quality in data mining (see This project is interested in the mining of nutritional data for discovering predictive biomarkers of diabetes and metabolic syndrome in elder populations. The considered data mining methods are hybrid, and they combine symbolic and numerical methods for mining complex and noisy metabolic data [80]. Regarding the mining process, we are interested in the quality of the data at hand and in the discovered patterns. In particular, we check the incompleteness of the data, the quality of the extracted rules and the possible existence of redescriptions.

Initially, the project involved researchers from the EPI Orpailleur, with researchers from LIRIS Lyon, ICube Strasbourg, and INRA Clermont-Ferrand. This year, we were merged with another Mastodons project, namely QualiBioConsensus, about the “ranking of biological data using consensus ranking techniques”. The joint Mastodons project is now called “HyQualiBio”. The topics of interest for the participants are the mining of complex biological data, rankings and ties in rankings, and the search of dependencies in the web of data.

PEPS Decade

Participants : Miguel Couceiro, Esther Catherine Galbrun, Nyoman Juniarta, Amedeo Napoli, Justine Reynaud, Chedy Raïssi.

Decade stands for “Découverte et exploitation des connaissances pour l'aide à la décision en chimie thérapeutique”. The objective of the CNRS PEPS Decade project is to study the basis of knowledge system for analyzing the so-called PAINS (“Pan Assay Interference Compounds”) in chemistry. The system should rely on the knowledge possibly discovered in the data and domain knowledge and expertise. The members of the projects are interested in data mining techniques guided by constraints and preferences, “instant data mining”, subgroup discovery and exceptional model mining. All these topics were already of interest in the PEPS Prefute (2015-2016) which was about interaction and iteration in the knowledge discovery process.

The members of the Decade project are from Greyc Caen, LIFO Orléans LIRIS Lyon, Université de Tours-Blois, EPI Lacodam in Rennes and EPI Orpailleur (in association with chemists based in Caen and Orléans)