Overall Objectives
Scientific Foundations
Application Domains
New Results
Contracts and Grants with Industry
Other Grants and Activities

Section: New Results

Distributed Computing Platforms: Measurements and Models

Participants : Yves Denneulin, Derrick Kondo, Jean-François Méhaut, Olivier Richard, Jean-Marc Vincent.

Network Models for Simulation and Emulation

Studies in distributed systems generally resort to simulations, which enable reproducible results and make it possible to explore wide ranges of platform and application scenarios. In this context, network simulation is certainly the most critical part. Many packet-level network simulators are available and enable high-accuracy simulation but they lead to prohibitively long simulation times. Therefore, many simulation frameworks have been developed that simulate networks at higher levels, thus enabling fast simulation but losing ac- curacy. One such framework, SimGrid, uses a low-level approach that approximates the behavior of TCP networks, including TCP's bandwidth sharing properties. A prelimliminary study of the accuracy loss by comparing it to popular packet-level simulators has been proposed previously, and regimes in which SimGrid's accuracy is comparable to that of these packet-level simulators are identified. In [41] , we come back on this study, reproduce these experiments and provide a deeper analysis that enables us to greatly improve SimGrid's range of validity.

Between discrete event simulation and evaluation within real networks, network emulation is a useful tool to study and evaluate the behaviour of applications. Using a real network as a basis to simulate another network's characteristics, it enables researchers to perform experiments in a wide range of conditions. After giving an overview of the various available network emulators, we compare and contrast in [35] three freely available and widely used network link emulators: Dummynet, NISTNet, and the Linux Traffic Control subsystem. We start by comparing their features, then focus on the accuracy of their latency and bandwidth emulation, and discuss the way they are affected by the time source of the system. We expose several problems that cannot be ignored when using such tools. We also outline differences in their user interfaces, such as the interception point, and discuss possible solutions. This work aims at providing a complete overview of the different solutions for network emulation.

Resources Availability for Large Systems

In the age of cloud, Grid, P2P, and volunteer distributed computing, large-scale systems with tens of thousands of unreliable hosts are increasingly common. Invariably, these systems are composed of heterogeneous hosts whose individual availability often exhibit different statistical properties (for example stationary versus non-stationary behavior) and fit different models (for example Exponen- tial, Weibull, or Pareto probability distributions). In [30] , we describe an effective method for discovering subsets of hosts whose availability have similar statistical properties and can be modelled with similar probability distributions. We apply this method with about 230,000 host availability traces obtained from a real large-scale Internet-distributed system, namely SETI@home. We find that about 34% of hosts exhibit availability that is a truly random process, and that these hosts can often be modelled accurately with a few distinct distributions from different families. We believe that this characterization is fundamental in the design of stochastic scheduling algorithms across large-scale systems where host availability is uncertain.

Economic Models for Cloud Computing

Cloud Computing has taken commercial computing by storm. However, adoption of cloud computing platforms and services by the scientific community is in its infancy as the performance and monetary cost-benefits for scien- tific applications are not perfectly clear. This is especially true for desktop grids (aka volunteer computing) applications. In [34] , we compare and contrast the performance and monetary cost-benefits of clouds for desktop grid applications, ranging in computational size and storage. We address the following questions: (i) What are the performance trade- offs in using one platform over the other? (ii) What are the specific resource requirements and monetary costs of creating and deploying applications on each platform? (iii) In light of those monetary and performance cost-benefits, how do these platforms compare? (iv) Can cloud computing platforms be used in combination with desktop grids to improve cost-effectiveness even further? We examine those questions using performance measurements and monetary expenses of real desktop grids and the Amazon elastic com- pute cloud.


Logo Inria