Section: New Results
Keywords : adaptive, intrusion detection, affinity propagation, data streams (data stream, data stream, data stream).
Online and Adaptive Intrusion Detection in Unlabelled Audit Data Streams
Participants : Wei Wang, Florent Masseglia.
Current anomaly based IDSs (Intrusion Detection System) have some difficulties for practical use. First, a large amount of precisely labelled data is very difficult to obtain in practical net-work environments. In contrast, many existing anomaly detection approaches need precisely labelled data to train the detection model. Second, data for intrusion detection is typically steaming and the detection models should be frequently updated with new incoming labelled data. However, many existing anomaly detection methods involve off-line learning. Third, many current anomaly detection approaches assume that the data distribution is stationary and the model is static accordingly. In practice, however, data involved in current network environments evolves continuously.
Our adaptive anomaly intrusion detection method addresses these issues through an online and unsupervised clustering algorithm in data streams, under the assumption that normal data is very large while abnormal data is rare in practical detection environments. Our method adaptively detects attacks with following three steps:
Building the initial model with Affinity Propagation (AP)  and its extension in streaming environments  .
Identifying outliers and updating the model in the streaming environments.
Rebuilding the model and identifying attacks. An attack is identified if an outlier is detected again after rebuilding the model.
Online and adaptive intrusion detection is a difficult task because no a priori knowledge (e.g., data distribution as well as labelled information) can be provided to the learning method. Our method can detect intrusions with AP in an online and adaptive fashion through dynamical clustering of audit data streams. A very large real HTTP logs collected in Apache server of INRIA Sophia Antipolis as well as a subset of KDD 1999 bench mark data are used to evaluate our method. Experimental results show that our method is promising in terms of effectiveness and efficiency.
A poster on this work has been accepted in EGC09.