Section: Other Grants and Activities
DSLLab, 2005-2009, ANR Jeunes Chercheurs
DSLlab is a research project aiming at building and using an experimental platform about distributed systems running on DSL Internet. The objective is twofold:
provide accurate and customized measures of availability, activity and performances in order to characterize and tune the models of the ASDL resources;
provide a validation and experimental tool for new protocols, services and simulators and emulators for these systems.
DSLlab consists of a set of low power, low noise computers spread over the ASDL. These computers are used simultaneously as active probes to capture the behavior traces, and as operational nodes to launch experiments. We expect from this experiment a better knowledge of the behavior of the ASDL and the design of accurate models for emulation and simulation of these systems, which represents now a significant capability in terms of storage and computing power.
NUMASIS, 2005-2009, ANR Calcul Intensif et Grilles de Calcul
Future generations of multiprocessors machines will rely on a NUMA architecture featuring multiple memory levels as well as nested computing units (multi-core chips, multi-threaded processors, multi-modules NUMA, etc.). To achieve most of the hardware's performance, parallel applications need powerful software to carefully distribute processes and data so as to limit non-local memory accesses. The ANR NUMASIS(NUMASIS: Adapting and Optimizing Applicative Performance on NUMA Architectures: Design and Implementation with Applications in Seismology) project aims at evaluating the functionalities provided by current operating systems and middleware in order to point out their limitations. It also aims at designing new methods and mechanisms for an efficient scheduling of processes and a clever data distribution on such platforms. These mechanisms will be implemented within operating systems and middleware. The target application domain is seismology, which is very representative of the needs of computer-intensive scientific applications.
ALPAGE, 2005-2009, ANR Masses de Données
The new algorithmic challenges associated with large-scale platforms have been approached from two different directions. On the one hand, the parallel algorithms community has largely concentrated on the problems associated with heterogeneity and large amounts of data. Algorithms have been based on a centralized single-node, responsible for calculating the optimal solution; this approach induces significant computing times on the organizing node, and requires centralizing all the information about the platform. Therefore, these solutions clearly suffer from scalability and fault tolerance problems.
On the other hand, the distributed systems community has focused on scalability and fault-tolerance issues. The success of file sharing applications demonstrates the capacity of the resulting algorithms to manage huge volumes of data and users on large unstable platforms. Algorithms developed within this context are completely distributed and based on peer-to-peer communications. They are well adapted to very irregular applications, for which the communication pattern is unpredictable. But in the case of more regular applications, they lead to a significant waste of resources.
The goal of the ALPAGE project is to establish a link between these directions, by gathering researchers (Mescal, LIP, LORIA, LaBRI, LIX, LRI) from the distributed systems and parallel algorithms communities. More precisely, the objective is to develop efficient and robust algorithms for some elementary applications, such as broadcast and multicast, distribution of tasks that may or may not share files, resource discovery. These fundamental applications correspond well to the spectrum of the applications that can be considered on large scale, distributed platforms.
SMS, 2005-2009, ANR
The ACI SMS, “Simulation et Monotonie Stochastique en évaluation de performances”, is composed by two teams: Performance Evaluation team from PRiSM Laboratory (ACI Leader) and the MESCAL project-team. The main objective is to study monotonicity properties of computer systems models in order to speed up the simulations and estimate performance indexes more accurately.
The composition formalisms we have contributed to develop during the recent years allow to build large Markov chains associated to complex systems in order to analyze their performance. However, it is often impossible to solve the stationary or transient distributions. Analytical methods and simulations fail for different reasons.
However brute performances are not really useful. We need the proof that the system is better than an objective. Therefore it is natural to use comparison of random variables and sample-paths. Two important concepts appear: stochastic ordering and stochastic monotony. We chose to develop these two important concepts and apply them to perfect simulation, distributed simulation and product form queuing network. These concepts seem to appear frequently in various techniques in performance evaluation. Using the monotony property, one can reduce the computation time for perfect simulation with coupling from the past. Coupling from the past allows to sample the steady-state distribution in a finite time. Thus we do not encounter the same stopping problem that holds for ordinary simulations. Furthermore, some results show that the monotony property is often present in queuing network even if they do not have product form. We simply have to renormalize them to let the property appear. Using both properties, it is also possible to derive distributed simulations which will be more efficient. We will develop two ideas: sample-path transformations to avoid rollback in optimistic simulations (and we compute a bound) and regenerative simulations.
Finally, these concepts can be used for product form queuing network to explain why some transformation applied on customer synchronization can provide product form solution, and also how we can compute a solution of the traffic equation when they are unstable.
Check-bound, 2007-2009 ANR SETIN
Partners: University of Paris I.
The increasing use of computerized systems in all aspects of our lives gives an increasing importance on the need for them to function correctly. The presence of such systems in safety-critical applications, coupled with their increasing complexity, makes indispensable their verification to see if they behaves as required . Thus the model checking which is the automated manner of formal verification techniques is of particular interest. Since verification techniques have become more efficient and more prevalent, it is natural to extend the range of models and specification formalisms to which model checking can be applied. Indeed the behavior of many real-life processes is inherently stochastic, thus the formalism has been extended to probabilistic model checking. Therefore, different formalisms in which the underlying system has been modeled by Markovian models have been proposed.
Stochastic model checking can be performed by numerical or statistical methods. In model checking formalism, models are checked to see if the considered measures are guaranteed or not. We apply Stochastic Comparison technique for numerical stochastic model checking. The main advantage of this approach is the possibility to derive transient and steady-state bounding distributions as well as the possibility to avoid the state-space explosion problem. For the statistical model checking we study the application of perfect simulation by coupling in the past. This method has been shown to be efficient when the underlying system is monotonous for the exact steady-state distribution sampling. We consider to extend this approach for transient analysis and to model checking by means of bounding models and the stochastic monotonicity. As one of the most difficult problems for the model checking formalism, we also study the case when the state space is infinite. In some cases, it would be possible to consider bounding models defined in finite state space.
MEG, 2007-2010, ANR blanc
The "ACI blanche” MEG, is composed of two teams: physicists working on electromagnetism from the LAAS (Toulouse) and the MESCAL project-team. The main objective is to study scaling properties in electromagnetism simulation applications and grids. The first results are promising. They demonstrate that the tools developed by Mescal on large data storage and middleware for deployment on clusters and grids are appropriate for that kind of application.
DOCCA, 2007-2011 ANR Jeunes Chercheurs
The race towards the design and development of scalable distributed systems offers new opportunities to applications, in particular as far as scientific computing, databases, and file sharing are concerned. Recently many advances have been done in the area of large-scale file-sharing systems, building upon the peer-to-peer paradigm that somehow seamlessly responds to the dynamicity and resilience issues. However, achieving a fair resource sharing amongst a large number of users in a distributed way is clearly still an open and active research field. For all previous issues there is a clear gap between
widely deployed systems as peer-to-peer file-sharing systems (KaZaA, Gnutella, EDonkey) that are generally not very efficient and do not propose generic solutions that can be extended to other kind of usage;
academic work with generally smart solutions (probabilistic routing in random graphs, set of node-disjoint trees, lagrangian optimization) that sometimes lack a real application.
Up until now, the main achievements based on the peer-to-peer paradigm mainly concern file- sharing issues. We believe that a large class of scientific computations could also take advantage of this kind of organization. Thus our goal is to design a peer-to-peer computing infrastructure with a particular emphasis on the fairness issues. In particular, the objectives of the ANR DOCCA(Design and Optimization of Collaborative Computing Architectures) project are the following:
to combine theoretical tools and metrics from the parallel computing community and from the network community, and to explore algorithmic and analytical solutions to the specific resource management problems of such systems.
to design a P2P architecture based on the algorithms designed in the second step, and to create a novel P2P collaborative computing system.
We expect the following results from this project:
to provide user synthetic models to the scientific community that can be used as an input in modeling, simulation and experimentation of P2P collaborative computing systems.
to provide optimal strategies and resource management algorithms in P2P collaborative computing.
to design a decentralized protocol that implements the optimal strategies for the target user models.
to implement a prototype and validate the approach on an experimental platform.
POPEYE, 2008-2009, ARC
Partners: INRIA Maestro, INRIA TOSCA, INRA, UMPC, LIA, Polytech Nice Sophia-Antipolis.
The MESCAL project-team participates in the Popeye INRIA ARC, lead by Eltan Altman of the INRIA Maestro project-team. The project focuses on the behavior of large complex systems that involve interactions among one or more populations. By population we mean a large set of individuals, that may be modeled as individual agents, but that we will often model as consisting of a continuum of non-atomic agents. The project brings together researchers from different disciplines: computer science and network engineering, applied mathematics, economics and biology. This interdisciplinary collaborative research aims at developing new theoretical tools as well as at their applications to dynamic and spatial aspects of populations that arise in various disciplines, with a particular focus on biology and networking.
OMP2, 2008-2010, NANO 2012
Rapid advances in multi-core technologies have been incorporated in general-purpose processors from Intel, IBM, Sun, and AMD, and special-purpose graphics processors from NVIDIA and ATI. This technology will soon be introduced to the next generation of processors in embedded systems. The increase in the number of cores per processor will introduce critical challenges for the access of data stored in memory. The synchronization of memory accesses is often done using the use of locks for shared variables. As the number of threads increases, the cost of synchronization also increases due to increased access to these shared variables. Transactional memory is currently an approach being actively investigated. The goal of this project is to improve the programability and performance of parallel systems using the approach of transactional memory in the context of embedded systems.
Aladdin-G5K, 2008-2011, ADT
Partners: INRIA FUTURS, INRIA Sophia, IRISA, LORIA, IRIT, LABRI, LIP, LIFL.
After the success of the Grid'5000 project of the ACI Grid initiative led by the French ministry of research, INRIA is launching the ALADDIN project to further develop the Grid'5000 infrastructure and foster scientific research using the infrastructure.
ALADDIN will build on Grid'5000's experience to provide an infrastructure enabling computer scientists to conduct experiments on large scale computing and produce scientific results that can be reproduced by others. ALADDIN focus on the following challenges :
Transparent, safe and efficient large scale system utilization and programming
Providing service agreement to users in large scale parallel and distributed systems
Providing confidence to the user about the infrastructure
Efficient exploitation of highly heterogeneous and hierarchical large-scale systems
Efficient and scalable composition and orchestration of services
Modeling of large scale systems and validation of their simulators
Scalable applications for large scale systems
Dynamic interconnection of autonomous and heterogeneous resources
Efficiently manage very large volumes of information (search, mining, classification, secure storage and access, etc) for a wide spectrum of applications areas (web applications, image processing, health, environment, etc).
Mescal members are particularly involved in topics 1, 3, 4, and 6.
ALEAE, 2009-2010, ARC
Partners: INRIA ALGORILLE, INRIA GRAAL, INRIA MESCAL, TU Delft.
The MESCAL project-team participates in the ALEAE project of the INRIA ARC program. This project is led by Emmanuel Jeannot of the INRIA ALGORILLE project-team, who recently moved to the RUNTIME project-team.
The project's goal is to provide models and algorithmic solutions in the field of resource management that cope with uncertainties in large-scale distributed systems. This work is based on the Grid Workloads Archive designed at TU Delft, Netherlands. Resulting from this collaboration, we have created the Failure Trace Archive, which is a repository of availabilty traces of distributed systems, and analytical tools. Moreover, we are conducting trace-driven experiments to test our solutions, to validate the proposed models, and to evaluate the algorithms. These experiments are being conducted using simulators and large-scale environments such as Grid'5000 in order to improve both models and algorithms.
PROHMPT, 2009-2011, ANR COSI
Partners: BULL SAS, CAPS entreprise, CEA CESTA, CEA INAC, INRIA RUNTIME, UVSQ PriSM
Processor architectures with many-core processors and special-purpose processors such as GPUS and the CELL processor have recenty emerged. These new and heterogeneous architectures require new applicaton programming methods and new programming models. The goal of the ProHMPT project is to address this challenge by focusing on the immense computing needs and requirements of real simulations for nanotechnologies. In order for nanosimulations to fully leverage heterogeneous computing architectures, project members will novel technologies at the compiler, runtime, and scientific kernely levels with proper abstractions and wide portability. This project brings experts from industry, in particular HPC hardware expertise from BULL and nanosimulation expertise from CEA.
PEGASE, 2009-2011, ANR ARPEGE
Partners: RealTimeAtWork, Thales, ONERA, ENS Cachan
The goal of this project to achieve performance guarantees for communicating embedded systems. Members will develop mathematical methods that give accurate bounds on maximum network delays in both space and aviation systems. The mathematical methods will be based on Network Calculus theory, which is type of queuing theory that deals with worst-case performance evaluation. The expected results will be novel models and software tools validated in mission-critical real-time embedded networks of the aerospace industry.
USS Simgrid, 2009-2011, ANR SEGI
Partners: INRIA Nancy, INRIA Saclay, INRIA Bordeaux, University of Reims, IN2P3, University of Hawaii at Manoa
The goal of the USS-SimGrid project is to allow scalable and accurate simulations by means of the SimGrid simulation toolkit. This toolkit is widely used for simulation of HPC systems. We aim to extend the functionality of the toolkit to enable the simulation of heterogeneous systems with more than tens of thousands of nodes.
There three main thrusts in this project. First, we will improve the models used in SimGrid, increasing their scalability and easing their instanciation. Second, we will develop tools that ease the analysis of detailed and large simulation results, and aid the management of simulation deployments. Third, we will improve the scalability of simulations using parallelization and optimization methods.
SPADES, 2009-2012, ANR SEGI
Partners: INRIA GRAAL, INRIA GRAND-LARGE, CERFACS, CNRS, INRIA PARIS, LORIA
Petascale systems consisting of thousands to millions of resources have emerged. At the same, existing infrastructure are not capable of fully harnessing the computational power of such systems. The SPADES project will address several challenges in such large systems. First, the members are investigating methods for service discovery in volatile and dynamic platforms. Second, the members creating novel models of reliability in PetaScale systems. Third, the members will develop stochastic scheduling methods that leverage these models. This will be done with emphasis on applications with task dependencies structured as graph.
Clouds@home, 2009-2013 ANR Jeunes Chercheurs
The overall objective of this project is to design and develop a cloud computing platform that enables the execution of complex services and applications over unreliable volunteered resources over the Internet. In terms of reliability, these resources are often unavailable 40% of the time, and exhibit frequent churn (several times a day). In terms of "real, complex services and applications", we refer to large-scale service deployments, such as Amazon's EC2, the TeraGrid, and the EGEE, and also applications with complex dependencies among tasks. These commercial and scientific services and applications need guaranteed availability levels of 99.999% for computational, network, and storage resources in order to have efficient and timely execution. As such we have the following goals:
To research methods that guarantee performance for computation and storage across unreliable Internet volunteered resources using a combination of prediction and virtual machine techniques
To design a cloud computing platform that allows complex services and applications to leverage this guaranteed computing and storage power
We are currently working in the following areas:
Predictive models of availability of groups of volatile Internet resources
Strategies for checkpointing applications using virtual machines (VM's) in low-bandwidth, volatile, and wide-area networks
Methods for data management that ensure data durability, availability, and access performance
Implementation of a cloud computing prototype with validation on an experimental platform such as PlanetLab.