Section: New Results
Keywords : KDD, preprocessing, data transformation, metadata, knowledge management, viewpoint, ontology, annotation, reusability, distances, dissimilarities.
Introduction
This year we obtained original results as previous years in our four research topics: data transformation and knowledge management, data mining, Web usage and Internet mining and document mining and Information retrieval.
Let us note new researches this year for supporting ontology construction and evolution (cf. sections 6.2.4 , 6.5.4 et 6.4.6 ) and on information visualization in data mining (cf. section 6.3.4 ). Let us also note that some previous works described in our 2005 annual report have been published this year on ``Dissimilarities for Web usage Mining'' ([55] , [54] ) and on XML Document mining (cf. section 6.5.1 ). More an hybrid clustering approach to approximate fastest paths on urban networks has also been published this year [18] .
First on data transformation and knowledge representation (cf. section 6.2 ), we pursued our researches on feature selection (cf. section 6.2.1 ) and on critical edition of sanskrit texts (cf. section 6.2.5 ). We studied also the use of metadata (cf. the KM point of view), in particular in two ongoing PhD thesis related to semantic web and KDD, conducted by H. Behja and A. Baldé. Ontologies and metadata have been used 1) for annotating global KDD processes in terms of viewpoints to support the management and the reuse of past KDD experiences (cf. section 6.2.2 ), 2) for supporting the interpretation of extracted clusters with the definition of an ontology and an interpretation model this year (cf. section 6.2.3 ).
Secondly on data mining methods (cf. section 6.3 ), we published new results on a new partitioning dynamic clustering method (cf. section 6.3.1 ), on self organizing maps (cf. section 6.3.2 ), on functional data analysis (cf. section 6.3.3 ) and on an agglomerative 2-3 Hierarchical Clustering in the context of Chelcea'PhD thesis (cf. section 6.3.6 ). This year we pursued actively the research topic started in 2005 related to mining data streams in the context of Marascu's PhD thesis (cf. section 6.3.5 ) and started a new research topic on Visualization (cf. section 6.3.4 ).
Thirdly on information systems data mining and more precisely on usage mining, we pursued our researches on mining Web user visits via applying in an original way dynamic clustering (cf. section 6.4.1 ) and crossed clustering (cf. section 6.4.2 ). We proposed also five original methods this year:
-
the GWUM method for extracting generalized usage patterns (cf. section 6.4.3 ),
-
a method for mining interesting periods from Web Access Logs (cf. section 6.4.4 ),
-
an approach based on a genetic-inspired algorithm for improving resource searching in a dynamic and distributed database such as a P2P system (cf. section 6.4.5 ),
-
a method based on usage mining for supporting the evolution of a Web site ontology (cf. section 6.4.6 ,
-
and finally 5) a method based on Ergonomics and WUM for analysing a Web site (cf. section 6.4.7 ).
Finally we pursued our researches on XML or HTML document mining and its applications such as the exploitation of a large collection of XML documents (cf. section 6.5.1 ), ontology construction (cf. section 6.5.4 ), scientific and technical watch (cf. section 6.5.3 ) or the improvement of information retrieval based on contextual aspects or ranking criteria (cf. section 6.5.6 ). Our researches aimed more precisely clustering or classifying XML documents based on their structure and content (cf. section 6.5.1 ), entity extraction from XML documents (cf. section 6.5.2 ), document mining for scientific and technical watch (cf. section 6.5.3 ), clustering HTML pages (cf. section 6.5.4 ) and contextual information retrieval (cf. section 6.5.5 ).