Section: Overall Objectives
Introduction
Knowledge discovery in databases –hereafter KDD– consists in processing a large volume of data in order to extract knowledge units that are significant and reusable. Assimilating knowledge units to gold nuggets, and databases to lands or rivers to be explored, the KDD process can be likened to the process of searching for gold. This explains the name of the research team: the “orpailleur” denotes in French a person who is searching for gold in rivers or mountains. Moreover, the KDD process is iterative, interactive, and generally controlled by an expert of the data domain, called the analyst . The analyst selects and interprets a subset of the extracted units for obtaining knowledge units having a certain plausibility. As a person searching for gold and having a certain knowledge of the task and of the location, the analyst may use its own knowledge but also knowledge on the domain of data for improving the KDD process.
A way for the KDD process to take advantage of domain knowledge is to be in connection with an ontology relative to the domain of data, for making a step towards the notion of knowledge discovery guided by domain knowledge or KDDK. In the KDDK process, knowledge units that are extracted have still a life after the interpretation step: they must be represented in an adequate knowledge representation formalism for being integrated within an ontology and reused for problem-solving needs. In this way, the results of the knowledge discovery process may be reused for extending and updating existing ontologies. The KDDK process shows that knowledge representation and knowledge discovery are two complementary tasks: no effective knowledge discovery without domain knowledge!