Section: Software
AxIS Web Log Preprocessing and Methods for Sequential Pattern Extraction
Participants : Brigitte Trousse [ co-correspondant ] , Yves Lechevallier [ co-correspondant ] , Anli Abdouroihamane, Celine Fiot, Cristina Isai.
AxISLogMiner issued from D. Tanasa's thesis is a software application that implements
-
our preprocessing methodology [16] for Web Usage Mining (WUM)
-
and methods for sequential pattern extraction with low support (Cluster & Divide and Divide & Discover [14] ): See Chapter 3 of Tanasa's thesis [87] for more details.
In 2008 in the context of the Eiffel project, we isolated and redesigned the core of AxISlogMiner preprocessing tool (we called it AWLH) composed of a set of tools for pre-processing web log files. AWLH can extract and structure log files from one or several Web servers, using different input format. The web log files are cleaned as usually before to be used by the datamining tool, as they contains many noisy entries (for example, robots bring a lot of noise in the analysis of user behaviour then it is important in this case to identify robot requests). The data are stored within a database whose model has been improved.
Now the current version of our Web log processing offers:
-
Processing of several log files from several servers (different formats);
-
Support of several input formats (CLF, ECLF, IIS, custom, ...);
-
Incremental pre-processing;
-
Java API to help integration of AWLH in external application.
In 2009 we developed a tool based on an open source project called "OpenSympony ClickStream" for recording the click actions made by a user in real time. During the capture process we create a table that is used by the AWLH tool to fulfill the tables required for the preprocessing and processing phases of the Web Usage Mining process.
More an extended version of AWLH has been developed for capturing and structuring data issued from annotating documents inside discussion forums.