Section: New Results
Providing access to HPC servers on the Grid
Participants : Nicolas Bard, Julien Bigot, Laurent Bobelin, Raphaël Bolze, Yves Caniou, Eddy Caron, Ghislain Charrier, Florent Chuffart, Benjamin Depardon, Frédéric Desprez, Gilles Fedak, Jean-Sébastien Gay, Haiwu He, Benjamin Isnard, Michael Heymann, Cristian Klein, Gaël Le Mahec, David Loureiro, Georges Markomanolis, Adrian Muresan, Hidemoto Nakada, Christian Pérez, Franck Petit, Vincent Pichon, Bing Tang, Daouda Traore, Wang Yu.
Service Discovery in Peer-to-Peer environments
We have published an extended version  of a work started in 2007 around the snap-stabilization of the Distributed Lexicographic Placement Table (DLPT) approach, building a prefix-tree based overlay network for an efficient peer-to-peer service discovery system for grids. Our approach is an alternative choice to inject fault-tolerance once replication, which is mainly used in similar systems, has failed. Moreover, replication can be very costly in terms of computing and storage resources and does not ensure the recovery of the system after arbitrary failures. Self-stabilization is an efficient approach to design reliable solutions for dynamic systems. It ensures a system to converge to its intended behavior, regardless of its initial state, in a finite time. A snap-stabilizing algorithm guarantees that it always behaves according to its specification, once the protocol is launched. We have provided the first snap-stabilizing protocol for tree construction. The proposed algorithm transforms an arbitrary labelled tree into a consistent prefix tree, in average, in O(h + h' ) rounds, where h and h' are the initial and final heights of the tree, respectively. In the worst case, the algorithm requires an O(n ) extra space on each node, O(n ) rounds and O(n2 ) actions. New simulations have been conducted, allowing to state that the worst cases are far from being reached and confirm the average complexities.
We have published a book chapter  of a more popularizing view of the DLPT architecture intended to be spread among the computer science community. The results presented summarize the DLPT approach, from early design to its fault-tolerant mechanisms offering formal guarantees in very dynamic and faulty platforms, via its use and load balancing mechanisms. The presentation of the chapter radically differs from previous technical papers on the same results while giving a global view of the work pursued in this area by the GRAAL team.
Deployment of hierarchical middleware
We consider the placement of the various elements of a hierarchical grid middleware. We consider the case where several services have to be deployed within the hierarchy (the case where only one service has to be made available has already been studied), and study several kinds of models and platforms. Our goal is to have fairness between the throughputs of different services, i.e., the ratio between the requested throughput and the obtained throughput should roughly be the same for each service. We studied two models: the first and simplistic one states that whenever a message is received at a given level, whatever the type of service it refers to, the message is sent to all the children; the second one forwards a message to a children only if the latter knows about this service. We then derived a closed form solution for the simple model on a homogeneous platform, and a bottom up heuristic for the more general model on both homogeneous platforms and platforms with heterogeneous computations but homogeneous communications. We also derived a genetic algorithm for totally heterogeneous platforms.
Scheduling of independent tasks under cluster availability constraints
We consider the scheduling of independent tasks on a grid of clusters, under the constraint that on each cluster a task can be scheduled only if on a given period of time the cluster load is not higher than a given upper bound. We described the problem using a mixed integer linear program. However, as the problem involves lots of variables and constraints, it is much too complicated to be solved this way. Hence, we designed a set of heuristics to solve the problem.
Mobile ad hoc networks as well as grid platforms are distributed, changing, and error prone environments. Communication costs within such infrastructures can be improved, or at least bounded, by using k -clustering. A k -clustering of a graph is a partition of the nodes into disjoint sets, called clusters, in which every node is at distance at most k from a designated node in its cluster, called the clusterhead. We designed a self-stabilizing asynchronous distributed algorithm for constructing a k -clustering of a connected network of processes with unique IDs and weighted edges. The algorithm is comparison-based, takes O(nk ) time, and uses O(log(n) + log(k) ) space per process, where n is the size of the network. Using simulations, we show that even if the complexity theoretical bound can be attained on particular graphs, our algorithm requires much less time to converge on many graphs. This is the first distributed solution to the k -clustering problem on weighted graphs.
Proof of concept of ULCM and recursive applications
In the the ANR LEGO project, we have designed a ULCM , a component model which combines various kinds of composition operators. In addition to the classical provide/use composition operators, ULCM also offers data sharing, master-worker and workflow operators. Hence, ULCM unifies classical component models with workflow based models.
In 2009, our main effort was devoted to the realization of ULCM i, a proof-of-concept implementation of ULCM . ULCM i is based on standard compiler technology (antlr ) that is used to build a representation of a program. Then, the ULCM i runtime is responsible for creating, connecting and running components. As ULCM supports workflows within composite components, a specific workflow engine is also part of the runtime.
ULCM i currently has four back-ends: a simulator back-end to test the validy of a program, a multithreaded Java back-end and a C++ one for local execution so as to study its application to multicore machines, and a CCM based back-end for distributed execution. The C++ multithreaded back-end is used in particular within the ANR NUMASIS project. It is interfaced with the thread library marcel , developed by the RUNTIME project-team.
In cooperation with EDF R&D, we have started studying the support of recursive algorithms in the context of ULCM . A particularly important use case is represented by adaptive mesh refinement applications which are particularly complex to implement. Preliminary results show that ULCM appears expressive enough. However, some questions remains open such as the simplicity of programming – which could be solved by generic components – and by the smallest level of granularity that can be reached while achieving high performance. Hence, the technology appears adequate for coarse and medium grain but the question remains open for fine grain.
Component models and algorithmic skeletons
In 2009, we have conducted a validation of the STKM model. STKM is a component model combining provide/use composition, workflow and algorithmic skeleton operators. STKM can be seen as going one step further than ULCM as it aims to study the possibility and the benefit of integrating algorithmic skeleton technology within component model in general, and advanced component models such as ULCM in particular. To verify the promises of the model, we have built a proof-of-concept implementation of STKM on top of SCA, a component model based on web services. The proposed mapping of STKM on top of SCA introduces a set of non-functional concerns needed to manage an STKM assembly; concerns that can be hidden to the end user and that can be used for execution optimizations. Hand-coded experiments show that STKM can lead to both better performance and resource usage than a model only based on workflows or skeletons. Hence, the promises of STKM can be achieved provided that the various elements of the model are correctly used. In the general case, it may require to apply optimization algorithms to applications. This work has been done in cooperation with the University of Pisa (Italy).
Component models and genericity
In order to be easily reusable, most component models require a component to be a binary version of a piece of code. However, with respect to most programming languages, this requirement limits reusability as all the types must be fixed. For example, it is not possible to have a general dispatching component for farm component. Moreover, we would like not only to make generic the data-types of component interfaces, but also the interfaces, as well as component types.
To this end, we have pursued our work on increasing reuse in component models by adding the concept of genericity to component models. In order to support dynamic instantiation and to explore the benefit for meta-programming thanks to explicit specializations as a mean to encode algorithmic skeleton, we opted for a solution à la C++ . Hence, all genericity-related features of the model are handled through a compilation phase. In this work, we restrict ourselves to static compilation. Further work may deal with dynamic compilation.
To leverage existing component models, the selected approach is to derive a generic meta-model from an existing one, and to provide an algorithm to transform generic component applications into non-generic ones. This has been applied to SCA, leading to a generic-SCA model. The model transformation algorithm has been implemented within Eclipse and the whole chain has been validated with an image rendering application based on a generic task farm component.
Deployment of hierarchical applications on grids
In the context of the ANR DISCOGRID project, we had developed a model to ease the programming of hierarchical applications, such as computational electromagnetics, on grids. The DISCOGRID model can be seen as an extension of MPI for a hierarchy of resources as well as the addition of hierarchical data redistributions. Another element of the DISCOGRID project was a multilevel partitioning tool that decomposes an unstructured mesh with respect to the available resources. Currently, this tool is limited to two levels, which are typically represented by a federation of clusters. However, as the tool computes a partition of the mesh on all resources, the issue was to select a set of resources amongst the available ones to give to such a tool. Because of Amdahl's law and the network latency and bandwidth, taking all the resources does not lead to minimize the execution time. Therefore, we have developed a performance model of a particular CEM application, based on the primitives of the DISCOGRID model. Then, we designed several resource selection algorithms, some based on specific heuristics and some based on generic heuristics such as random or simulated annealing. A series of experiments based on simulations and on real experiments on Grid'5000 demonstrates the validity of the performance model as well as the good accuracy and quick response time of some resource selection algorithms.
Towards Data Desktop Grid
In this work, we have proposed the BitDew framework which addresses the issue of how to design a programmable environment for automatic and transparent data management on computational Desktop Grids. We described the BitDew programming interface, its architecture, and the performance evaluation of its runtime components. BitDew relies on a specific set of meta-data to drive key data management operations, namely life cycle, distribution, placement, replication and fault-tolerance with a high level of abstraction. The BitDew runtime environment is a flexible distributed service architecture that integrates modular P2P components such as DHT's for a distributed data catalog and collaborative transport protocols for data distribution. Through several examples, we describe how application programmers and BitDew users can exploit BitDew 's features. The performance evaluation demonstrates that the high level of abstraction and transparency is obtained with a reasonable overhead, while offering the benefit of scalability, performance and fault tolerance with little programming cost.
Data-intensive applications form an important class of applications for the e-Science community which require secure and coordinated access to large datasets, wide-area transfers and broad distribution of TeraBytes of data while keeping track of multiple data replicas. In computational genomics, gene sequences comparison and analysis are the most basic routines. With the considerable increase of sequences to analyze, we need more and more computing power as well as efficient solution to manage data.
In this work, we have investigated the advantages of using a new Desktop Grid middleware BitDew , designed for large scale data management. Our contribution is two-fold: firstly, we introduce a data-driven Master/Slave programming model and we present an implementation of BLAST over BitDew following this model, secondly, we present extensive experimental and simulation results which demonstrate the effectiveness and scalability of our approach. We evaluate the benefit of multi-protocol data distribution to achieve remarkable speedups, we report on the ability to cope with highly volatile environments with relative performance degradation, we show the benefit of data replication in Grid with heterogeneous resource performance and we evaluate the combination of data fault tolerance and data replication when computing on volatile resources.
MapReduce programing model for Desktop Grid
Since its introduction in 2004 by Google, MapReduce has become the programming model of choice for processing large data sets. MapReduce borrows from functional programming, where a programmer can define both a Map task that maps a data set into another data set, and a Reduce task that combines intermediate outputs into a final result. Although MapReduce was originally developed for use by web enterprises in large data-centers, this technique has gained a lot of attention from the scientific community for its applicability in large parallel data analysis (including geography, high energy physics, genomics, etc..).
During 2009, we have started an implementation of the MapReduce programming model on Desktop Grid using the BitDew middleware. Although this research addresses many issues, such as efficient scheduling of data and tasks, distributed result certification, large scale collective communication (broadcast and reduction) on volatile resources, early experiments with the prototype are being done on Grid5K to evaluate the performance of our implementation. We expect the first results during 2010.
Bridging Grid and Desktop Grid
Service grids and desktop grids are both promoted by their supportive communities as effective solutions for providing huge computing power. Little work, however, has been undertaken to blend these two technologies together in an effort to create one vast and seamless pool of resources. In the context of the EDGeS FP7 infrastructures project, entitled Enabling Desktop Grids for e-Science (EDGeS), we collaborate to build technological bridges to facilitate service and desktop grids interoperability. Within the consortium, we are leader of the JRA1 work package which provides the software to bridge Desktop Grids and Service Grids. In past work, we have given a detailed presentation of the BOINC to EGEE bridge, and we addressed the security issues when bridging Service Grids with Desktop Grids.
In 2009, we have extended the EDGeS bridge so that EGEE users can get access to additional resources provided by XtremWeb -HEP Desktop Grids. We have built a new public XtremWeb Desktop Grid called EDGeS@Home, which allows the general public to donate their idle time to EGEE users by executing EGEE applications in a way similar to what BOINC does. Finally, we have set up two bridges which connect the EDGeS VO to three different XtremWeb -HEP based Desktop Grids running at the University Paris-XI. We plan to extend this test and production infrastructure to Grid5K.
Sandboxing for Desktop Grid
In this work, we have investigated methods and mechanisms that enable the use of virtual machines as part of a security infrastructure for Desktop Grid clients to provide a sandbox for running (untrusted) applications.
Desktop Grids harvest the computing power of idle desktop computers whether these are volunteer or deployed at an institution. Allowing foreign applications to run on these resources requires the sender of the application to be trusted, but trust in goodwill is never enough. An efﬁcient solution is to provide a secure isolated execution environment (“sandbox”), which does not constrain any additional burden neither on administrators nor on users. Currently Desktop Grids do not provide such facility. We deﬁned and analyzed the requirements for any platform independent and transparent sandbox for Desktop Grids. We designed a prototype, which we built based on our ﬁndings and we give a performance evaluation.
Meta-Scheduling and Task Reallocation in a Grid Environment
Parallel resources in a grid are generally accessed through a batch system which both schedules and reserves the resources in accordance to its scheduling policy. Each batch system has its own scheduling algorithm which constructs the schedule with the available task information (number of requested processors, walltime, for example) at submission time. However, walltime is generally over-estimated. This does not necessarily have consequences at the local level in terms of resource utilization if techniques of backfilling are used, but at the grid level, the meta-schedule built by the grid middleware may not be the best anymore according to the metric to optimize.
Thus, we have explored the possibility to migrate grid-submitted jobs which are still in the waiting queue of batch systems with which the grid middleware is discussing. Several heuristics have been tested, among which well-known heuristics and a lot of experiments have been simulated using real life batch traces. Work is still in progress with automatically-tuned parallel applications.
We aim to decide if migrating waiting tasks is interesting in terms of optimizing some metrics (quantify) and to test which mechanisms have to be involved in a real implementation in a grid middleware, since the implementation and maintenance cost of that kind of code may be a considerable drawback if the gain is too small.
Enabling Distributed Computation and Fault-Tolerance Among Stigmergic Robots
We investigate avenues for the exchange of information (explicit communication) among deaf and dumb mobile robots scattered in the plane. We introduce the use of movement-signals (analogously to flight signals and bee waggle) as a mean to transfer messages, enabling the use of distributed algorithms among robots. We propose one-to-one deterministic movement protocols that implement explicit communication among asynchronous robots. We first show how the movements of robots can provide implicit acknowledgment in asynchronous systems. We use this result to design one-to-one communication among a pair of robots. Then, we propose two one-to-one communication protocols for any system of n2 robots. The former works for robots equipped with observable IDs that agree on a common direction (sense of direction). The latter enables one-to-one communication assuming robots are devoid of any observable ID or sense of direction. All three protocols (for either two or any number of robots) assume that no robot remains inactive forever. However, they cannot avoid that the robots move either away from, or closer to, each other, by the way requiring robots with an infinite visibility. We also show how to overcome these two disadvantages.
These protocols enable the use of distributed algorithms based on message exchanges among swarms of Stigmergic robots. They also allow robots to be equipped with means of communication to tolerate faults in their communication devices.