Section: New Results
Network measurement, modeling and understanding
Participants : Chadi Barakat, Walid Dabbous, Roberto Cascella, Alfredo Grieco, Mohamad Jaber, Amir Krifa, Imed Lassoued, Stevens Leblond, Arnaud Legout.
The main objective of our work in this domain is a better monitoring of the Internet and a better control of its resources. In the monitoring part, we work on new measurement techniques that scale with the fast increase in Internet traffic and growth of its size. We propose solutions for a fast and accurate identification of Internet traffic based on packet size statistics. Within the ECODE FP7 project, we work on a network-wide monitoring architecture that, given a measurement task to perform, tune the monitors inside the network optimally so as to maximize the accuracy of the measurement results. Within the ANR CMON project, we work on monitoring the quality of the Internet access by end-to-end probes, and on the detection and troubleshooting of network problems by collaboration among end users. In the network control part, we focus on new solutions that improve the quality of service to users by a better management of network resources and by a more efficient tuning of applications that take into account the constraints posed by the network. In this direction we propose distributed topology-aware algorithms for the scheduling of communications among members of a wireless community interested in sharing data files among each other. This is the main functionality provided by our open-source software BitHoc [49] .
Next, is a sketch of our main contributions in this area.
-
Internet traffic classification by means of packet level statistics
One of the most important challenges for network administrators is the identification of applications behind the Internet traffic. This identification serves for many purposes as in network security, traffic engineering and monitoring. The classical methods based on standard port numbers or deep packet inspection are unfortunately becoming less and less efficient because of encryption and the utilization of non standard ports. In this activity, we come up with an online iterative probabilistic method that identifies applications quickly and accurately by only using the size of packets. Our method associates a configurable confidence level to the port number carried in the transport header and is able to consider a variable number of packets at the beginning of a flow. By verification on real traces we observe that even in the case of no confidence in the port number, a very high accuracy can be obtained for well known applications after few packets were examined. Further details on the method and the experimental results can be found in [28] . The work is continuing by validating the method on more traces and extending its application to more advanced scenarios.
-
Adaptive network-wide traffic monitoring The remarkable growth of the Internet infrastructure and the increasing heterogeneity of applications and users' behavior make more complex the manageability and monitoring of ISP networks and raises the cost of any new deployment. The main consequence of this trend is an inherent disagreement between existing monitoring solutions and the increasing needs of management applications. In this context, we work on the design of an adaptive centralized architecture that provides visibility over the entire network through a network-wide cognitive monitoring system. Given a measurement task, the proposed system drives its own configuration in order to address the tradeoff between monitoring constraints (processing and memory cost, collected data) and measurement task requirements (accuracy, flexibility, scalability). We motivate our architecture with an accounting application: estimating the number of packets per flow, where the flow can be defined in different ways to satisfy different objectives. The performance of our system is being validated in typical scenarios over an experimental platform we are developing for the purpose of the study. This platform presents a new approach for the emulation of Internet traffic and for its monitoring across the different routers. It puts at the disposal of users a real traffic emulation service coupled to a set of libraries and tools capable of Cisco NetFlow data export and collection, the overall destined to run advanced applications for network-wide traffic monitoring and optimization.
The activities in this direction are funded by the ECODE FP7 STREP project (Sep. 2008 - Sep. 2011).
-
Spectral analysis of packet sampled traffic Packet sampling techniques introduce measurement errors that should be carefully handled in order to correctly characterize the network behavior. In the literature several works have studied the statistical properties of packet sampling and the way it should be inverted to recover the original network measurements. Here we take the new direction of studying the spectral properties of packet sampling. A novel technique to model the impact of packet sampling is proposed based on a theoretical analysis of network traffic in the frequency domain. Moreover, a real-time algorithm is also developed to detect the spectrum portion of the network traffic that can be restored once packet sampling has been applied. The analysis and some experimental results to validate the approach are published in [26] .
-
Monitoring the quality of the Internet access by end-to-end probes The detection of anomalous links and traffic is important to manage the state of the network. Existing techniques focus on detecting the anomalies but little attention has been devoted to quantify to which extent network anomaly affects the end user access link experience. We refer to this aspect as the local seriousness of the anomaly. In order to quantify the local seriousness of an anomaly we consider the percentage of affected destinations, that we call the “impact factor”. In order to measure it, a host should monitor all possible routes to detect any variation in performance, but this is not practical in reality. In this activity, funded by the ANR CMON project, we work on finding estimates for the impact factor and the local seriousness of network anomalies through a limited set of measurements to random nodes we called landmarks.
We initially study the user access network to understand the typical features of its connectivity tree. Then, we define an unbiased estimator for the local seriousness of the anomaly and a framework to achieve three main results: (i) the computation of the minimum number of paths to monitor, so that the estimator achieves a given significance level, (ii) the localization of the anomaly in terms of hop distance from the local user, and (iii) the optimal selection of landmarks. We are using real data to evaluate in practice the local seriousness of the anomaly and to determine the sufficient number of landmarks to select randomly without knowing anything on the Internet topology. The localization mechanism leverages the study on the connectivity tree and the relationship between the impact factor and the minimum hop distance of an anomaly. Our first results show that the impact factor is indeed a meaningful metric to evaluate the quality of Internet access.
-
Understanding peer-to-peer dynamics
This activity focuses on the understanding and improvement of peer-to-peer content delivery. Indeed, we believe that the value of peer-to-peer comes from its ability to distribute contents to a large number of peers without any specific infrastructure, and within a delay that is logarithmic with the number of peers.
We have also worked, in the context of the Ph.D. thesis of Stevens Le Blond, on how to make BitTorrent ISP friendly [54] . One major issue with BitTorrent is that is does no take into account the underlying network topology. As a consequence some specific links are overloaded, and ISPs have to block BitTorrent traffic in order to decrease the load on those links. One solution to this problem is keep the BitTorrent traffic local to each ISP, leveraging on the ISPs network topology. This notion of locality has raised a huge interest recently. However, all proposed solutions consider only moderate locality. In [54] we answer two important questions.
First, how much traffic can be kept local without adversely impacting peers? Whatever the locality solution is, it will impact the structure of the overlay interconnecting peers. We go much further than previous work on the understanding of the impact of locality on the structure of that overlay. In addition, reducing the amount of traffic that is kept local in order to prevent partitions (thus a loss of performance for peers) is the solution adopted by P4P and Ono. We introduce a simple mechanism (that is backward compatible) to prevent partitions and show that the traffic reduction on specific links can be dramatically reduced by keeping more traffic local without adversely impacting peers. Those kind of mechanisms are very important because they enable to reap full benefits from the information provided by the locality solution.
Second, what would be the benefit of a locality policy at the Internet scale? We show using a real world crawl of a large fraction of the all BitTorrent peers that the benefit from deploying a locality policy today would be a reduction of 40% on inter-AS links. To the best of our knowledge, the work closer to ours is the one of P4P. However, they only consider one torrent and one AS. Whereas the P4P field tests are successfully used to support the relevance of the P4P architecture, they cannot be used to support the relevance of a locality policy at the scale of the Internet. But that last point fundamental for the locality justification. A classical argument is that at the scale of the Internet the benefit of a locality policy will be negligible because there is on average one or a few peers per AS. We show that even if it is true that there is on average few peers per AS, the reduction of traffic on inter-AS links that can be achieved using a locality policy is still high at the Internet scale. Thus we believe that this is the first large scale measurement that strongly support the relevance of the implementation of a locality policy in the Internet.
In another work [57] , we evaluate the experimental bias that may occur when running BitTorrent experiments on a testbed. We show that there is no bias due to the shorter delay compared to the real internet. Therefore, it means that very realistic BitTorrent experiments can be run on a testbed or cluster of machines.
Finally, we have explored privacy issues with BitTorrent. In a first work, we show for the first time that it is possible to monitor in real time all peers that are using BitTorrent world wide. Using this information, we show that the initial source of contents can be identified and we quantified their impact on the performance of BitTorrent.
In a Second work, we show that TOR, the anonymizing network, fails to protect the privacy of BitTorrent users. Indeed, we show that it is possible the retreive the IP address of BitTorrent users on top of Tor, but even worse, that all applications run by those users are potentially compromized.