Section: New Results
Softwares for Mining Sequential Patterns in Data Streams
Participants : Alice Marascu, Florent Masseglia, Yves Lechevallier.
As a result of Marascu's thesis [19] , a collection of softwares have been developed for knowledge discovery (sequential patterns and clusters) and security in data streams.
-
3 clustering algorithms for mining sequential patterns (Java) in data streams have been developped by A. Marascu during her thesis [19] . The softwares take batches of data in the format "Client-Date-Item" and provide clusters of sequences and their centroids in the form of an approximate sequential pattern calculated with an alignment technique.
-
SMDS compares the sequences to each others with a complexity of O(n2) .
-
SCDS is an improvement of SMDS, where the complexity is enhanced from O(n2) to O(n.m) with n the number of navigations and m the number of clusters.
-
ICDS is a modification of SCDS. The principle is to keep the clusters' centroids from one batch to another.
-
-
GEAR is an implementation (Java) of the history management strategy proposed in Marascu's thesis [19] . It takes a set of time series and provides a memory representation of these series based on a new principle, where salient events are important (in contrast to the recent events of decaying models).
-
WOD is an implementation (Java) of our outlier detection method proposed in [32] . Given a set of clusters and their sizes (a list in the format "clusterId-size") WOD gives a natural separation between small ang big clusters thanks to a wavelet decomposition of the list.
More on intrusion detection, let us cite COD an implementation of our intrusion detection method proposed in [34] . It takes data files in the format "Client-Date-Item" obtained after having preprocessed the usages of two web sites. It gives their common outliers by means of WOD.