Section: New Results
Web Usage Mining
Construction and analysis of evolving data summaries
The Web access patterns tend to be very dynamic in nature due not only to the dynamics of Web site content and structure, but also to changes in the user's interests. Consequently, the models associated with these patterns must be continuously updated in order to reflect the actual patterns of user access. One solution to this problem was proposed and described in [Oops!] [Oops!] [Oops!] [Oops!] [Oops!] . The goal of our approach is to update the models using summaries obtained by means of an evolutionary approach based on clustering strategies. The approach proposed in these works consists in dividing the time period analyzed into more significant sub-periods (in our case, the months of the year) with the aim of discovering the evolution of old patterns or the emergence of new ones. After that, a clustering method is carried out on data of each sub-period, as well as over the complete period. The results provided for each clustering are then compared. We proposed four types of clustering strategies: Global clustering (performed on all existing data), Local independent clustering (performed on each time sub-period separately), Local previous clustering (performed by means of the affectation of the data in each time sub-period to the prototypes from the previous clustering) and Local dependent clustering (performed with an initialization of the clustering algorithm with the prototypes of the clusters from the previous time sub-period, the algorithm is run until it reaches the convergence). The statistical fundamentals of this approach were presented in [Oops!] [Oops!] . Moreover, a survey of techniques taking into account the temporal dimension in such analyses was proposed in [Oops!] .
Mining Interesting Periods from Web Access Logs
In this work done in collaboration with M. Teisseire (LIRMM) and P. Poncelet (Ecole des Mines d'Alès), we have focused on a particular problem that has to be considered by Web Usage Mining techniques: the arbitrary division of the data which is done today. This problem was introduced in  . This division comes either from an arbitrary decision in order to provide one log per x days ( e.g. one log per month), or from a wish to find particular behaviours ( e.g. the behaviour of the Web site users from November 15 to December 23, during Christmas purchases).
The outline of our method [Oops!] is the following: enumerating the sets of periods in the log that will be analyzed and then identifying which ones contain frequent sequential patterns. Our method will process the log file by considering millions of periods (each period corresponds to a sub-log). The principle of our method will be to extract frequent sequential patterns from each period. Our proposal is a heuristic-based miner, our goal is to provide a result having the following characteristics:
For each period p in the history of the log, let realResult be the set of frequent behavioural patterns embedded in the navigation sequences of the users belonging to p . realResult is the result to obtain ( i.e. the result that would be exhibited by a sequential pattern mining algorithm which would explore the whole set of solutions by working on the clients of Cp ).We want to find most of the sequences occurring in realResult while preventing the proposed result becoming larger than it should (otherwise the set of all client navigations would be considered as a good solution, which is obviously wrong).
In the new version of this work [Oops!] , we have improved the genetic operators that are involved in our method. These operator range from the mere extension of a sequence with a frequent item to the more complex crossing of sequences. We now propose a comparison of the efficiency between those operators in a new set of experiments. We have also provided a better comparison with existing methods for mining sequential patterns. This comparison is based on the support of the extracted patterns, as well as the ability of existing method to extract some of those patterns.
In our experiments, we have extracted interesting behaviours. Those behaviours show that an analysis based on multiple division of the log (as described in this paper) allows obtaining behavioural patterns embedded in short or long periods.
Web site analysis based on an Ergonomic and Web usage Mining Approach
In 2006 AxIS has began to set up a new method for web site evaluation, articulating usage mining approach and human factors expertise (cf. our 2006 annual report). The first study during the MobiVIP Project [Oops!] showed that combining Ergonomic and Web usage Mining Approaches was very fruitful and we want to go further in this direction. A rapid analysis of the state of the art as shown that the two INRIA research teams which have their focus on user interface design and evaluation (In-situ and Merlin) as well as other french academic laboratories (LIC/IIHM, LIG/Multicom, IRIT/I3C) and specialised laboratories in usage analysis (Laboratoires des Usages: Marsouin, LUCE, Lutin, Lucsi, LDU of Sophia Antipolis), do not presently use data mining technologies when evaluating web sites. Due to the place at which usability evaluation methods have to be run after design changes, the international effort is mainly oriented toward automatization of the evaluation processes. Previous attemps have tried to compute web metrics (e.g. Rating Game, WebTango, etc.), to connect log files analysis and task interaction models (for instance, QUIP, KALDI ) or to implement human factors expertise in knowledge bases (for instance: Sherlock, Ergoval, Synop, Ergo-conceptor). As full automatization is still often decepting, we believe much more in a cognitive coupling in which web site evaluation relies both on human ability and powerful technologies. The effort will be pursued with the FOCUS platform (see 8.1.2 ).