Section: New Results
Experimentation Methodology
Participants : Pierre-Nicolas Clauss, Fekari El Medhi, Emmanuel Jeannot, Jens Gustedt, Martin Quinson, Cristian Rosa.
A survey comparing different experimental methodologies
We have worked on classifying and comparing experimental methodologies and tools, see [17] . In order to test or validate a solution, one need to execute a real (or a model of an) application on a real (or a model of an) environment. This leads to four classes of methodologies: in-situ where we execute a real application (program, set of services, communications, etc.) on real a environment (set of machines, OS, middleware, etc.), emulation where we execute a real application on a model of the environment, benchmarking where we execute a model of an application on a real environment and lastly simulation where we execute a model of an application on a model of the environment. Such classification is very general and applicable to other domains than large-scale systems. However, we have focused on large-scale systems and compared a set of relevant tools for each of these four classes.
Improvement of the SimG rid tool
This year was the first year of the ANR project centered on SimG rid, of which we are principal investigator (see 8.2.3 ). Therefore, we worked on setting up the bases in preparation of the numerous contributions expected from the ANR participants in the next few years. This work on the internals aimed at simplifying the code base, and increasing its modularity.
We also continued our work from the previous years on the tool scalability to make it usable not only by the grid computing community, but also by the peer-to-peer scientists. These various software engineering improvements lead to performance speedup of up to 50%.
We initiated a collaboration with the Reso team of INRIA Rhônes-Alpes to study the possibility to add energy models to the simulator, so that it could be used to assess not only the time an algorithm needs to complete, but also the consumed energy during that process. This work did not lead to any publication so far.
Finally, we started to integrate model-checking facilities to the tools so that it could be used not only for performance assessment (mainly comparing the makespan), but also for correctness assessment (checking for example whether the algorithm contains deadlocks). See Section 6.3.4 for more details.
A Platform Description Archive for Reproducible Simulation Experiments
Simulation is a common approach to explore various experimental scenarios in a reproducible way and in a reasonable amount of time. This argument of reproducibility is often limited to the authors of a simulator as they rarely give access to the platform and applications they use in their articles.
This year, we continued our work on the Platform Description Archive (PDA ) which is an effort to make platform descriptions and tools available to users of simulation toolkits. In particular, we developed a tool called simulacrum to generate the platform descriptions needed by simulation users. A publication about this tool is pending approval to the CCGrid conference.
Formal Verification of Distributed Algorithms
As stated before, distributed algorithms can get challenging to assess and debug. Whereas formal verification in general and model checking in particular is now routinely used for concurrent and embedded systems, existing algorithms and tools can rarely be effectively applied for the verification of asynchronous distributed algorithms and systems. In joint research with Stephan Merz of the Mosel team of INRIA Nancy and LORIA, we have started to explore two approaches to address this problem.
In a first approach, Sabina Akhtar developed an extension of the PlusCal language [52] defined by Leslie Lamport. The extension is intended for describing and verifying models of distributed algorithms, whereas the original language is geared towards shared-memory concurrent programming. The compiler from this language to the TLA + tool suite is now operational, and were successfully used for several simple examples. We plan to build upon this tool by studying more complex examples and improving the tool performance by leveraging the partial state ordering specific to distributed systems in order to reduce the combinatorial problem posed by the current exhaustive exploration of the state space.
In a complementary approach, we are interested in adding model checking support to the SimG rid framework and specifically to the GRAS API for simulating and implementing Grid algorithms. The main difficulties to achieve that goal are to save and restore the state of every user processes to allow the model-checker restoring the state of a process to explore another possible execution branch. This year, we implemented a prototype allowing to save the state of the whole simulation ([35] ). This approach is easier to implement, but less effective since the state snapshots are larger, making the state explosion issue (inherent to model-checking) even worse. In order to implement another approach where the state of each process would be saved independently (planed for future work), we had to first do the software re-engineering and simplifications mentioned in 6.3.2 .
SMPI
One of the main limitation of SimG rid is that the application has to be written using a specific interface. One possible solution would be to add a new interface to the simulator matching exactly an existing interface. This year, we continued the work on this topic, initiated by Henri Casanova and Mark Stilwell at University of Hawai'i at Manoa.
Previous work included a prototype implementation of various MPI primitives such as send, recv, isend, irecv and wait. We implemented several collective operations such as bcast and reduce. We also setup a testing infrastructure to ensure that these new functionality remain usable in the future. One of the main difficulty we faced was the lack of specification of the MPI primitives themselves, as discussed in [54] . We plan to investigate this semantic specifications in the future as it is of major interest for SMPI itself, but also have strong implications on formal methods applied to parallel and distributed applications.
The work on SMPI is still in progress and did not lead to any publication so far. We plan to instigate some formal collaboration on this between us and the team of Prof. Casanova in the near future.
Wrekavoc
We have extensively tested Wrekavoc to show its accuracy and scalability. We have shown that it is able to correctly emulate a network made of hundreds of nodes where congestion happen [23] [stale citation canon-dubuisson-gustedt-jeannot-JSS].
Aladdin-G5K
Grid'5000 aims at building an experimental Grid platform featuring a total of five thousands CPU cores over nine sites in France. In 2009, a third cluster was installed in Nancy. It is named “griffon ” and is composed of 92 nodes (with 16 cores and 16 GB of RAM each) and an Infiniband-20G network. To control the power consumption, the first cluster that was installed in Nancy in 2005, “grillon ”, was shut down. The “Grelon ” cluster (120 quad-core nodes, 2 GB of RAM each) is also operational.
The Grid'5000 Spring School was organized in Nancy from April 7th to April 10th.
Experimental cluster of GPUs
The experimental platform of SUPÉLEC for "GPGPU", see Section 4.3.6 , has been greatly improved in 2009. Researchers of SUPÉLEC and AlGorille can now access two 16-nodes GPU clusters, supervised with remote energy monitoring devices. The first cluster was already available in 2008 and is composed of 16 PCs, each one hosting a dual-core CPU and a GPU card: a nVIDIA GeForce 8800 GT, with 512MiB of RAM (on the GPU card). The 16 nodes are interconnected across a devoted Gigabit Ethernet switch. The second cluster has 16 more recent nodes, composed of an Intel Nehalem CPU with 4 hyper-threaded cores at 2.67GHz, and a nVIDIA GTX285 GPU card with 1GB of memory. This cluster has a Gigabit Ethernet interconnection network too. Energy consumption of each node of each cluster is now monitored by a Raritan DPXS20A-16 device that continuously measures the electric power consumption (in Watts). Then a Perl script samples these values and computes the energy (Joules or WattHours) consumed by the computation on each node and on the complete cluster (including the interconnection switch). Each cluster is associated to a Raritan DPXS20A-16 device, which can monitor up to 20 nodes.
Also, several parallel applications, with different kind of algorithms, have been developed and evaluated on this platform in 2009. Results have been introduced in scientific conferences and to the European COST-IC0804 about energy efficiency in large scale distributed systems .