Section: New Results
Algorithms: On-line Algorithms for Traffic Measurements
Participants : Yousra Chabchoub, Christine Fricker.
Analysis of a counting algorithm
Joint work with H. Mohamed, University of Nanterre.This work is related to the design and performance evaluation of a probabilistic algorithm to identify long flows (elephants) in Internet traffic.
The algorithm, originally proposed by Azzana [26] is based on Bloom filters: the ID of each packet is sent via independent hashing functions to filters (hash tables). The idea is that, since each packet of a flow is hashed to the same counters, long flows (more than C = 20 packets) will then be identified. Because of the accumulation of short flows, filters must be refreshed from time to time to cope with repeated collisions in hash tables. Specifically, the non-null counters are decremented by one every time the filling rate of the filter reaches some threshold r .
This algorithm has been widely experimented, it has been tested on both commercial Orange traces and academic Abilene traces. The algorithm engenders both false positives (mice detected as elephants) and false negatives (missed elephants). To estimate the number of false negatives, a simple model has been proposed, first for mice with size one, in terms of an urn and ball model. It has been used to evaluate the impact of refreshment on the proportion of false positives, i.e., short flows detected as long flows with more than C packets. Limit theorems of the empirical distribution of the filter counters (mean field limit) when the filter size is large have been obtained. The limiting stationary distribution is deterministic and has a nice interpretation in terms of queues. The proof is based on the convergence to a dynamical system and uses a queueing interpretation of the fixed point of the system. The convergence of the invariant measure is completely proved for C = 2 , where a Lyapunov function is exhibited.
In the considered model, all counters associated with the ID of a given packet are incremented. In Chabchoub et al. [16] we investigate an improvement of this policy which consists in incrementing only the lowest valued counters. It is clear that this algorithm will allow a more accurate estimate of the number of long flows. Using the shortest queue drastically reduces the tail of queue size distribution. This problem has been extensively analyzed recently under the general heading of the power of (two) choice(s). In our setting, a new version of the algorithm has been studied: only one counter is incremented among d counters provided by d hashing functions. This algorithm has been tested on traffic traces and shown to perform well, especially when d is small (2 or 4). The previous results partly extend to this framework: the limiting time between two refreshes is analytically tractable. The convergence to a dynamical system is obtained, though it is more complicated to describe analytically. The existence of a fixed point is proved but not uniqueness. The global stability of the dynamical system, which could lead to the same limit result remains an open question. It is under investigation.