Section: Software
DIET
Participants : Nicolas Bard, Raphaël Bolze, Yves Caniou, Eddy Caron, Ghislain Charrier, Frédéric Desprez [ correspondent ] , Jean-Sébastien Gay, Vincent Pichon.
Huge problems can now be processed over the Internet thanks to Grid Computing Environments like Globus or Legion. Because most of the current applications are numerical, the use of libraries like BLAS, LAPACK, ScaLAPACK, or PETSc is mandatory. The integration of such libraries in high level applications using languages like Fortran or C is far from being easy. Moreover, the computational power and memory needs of such applications may of course not be available on every workstation. Thus, the RPC paradigm seems to be a good candidate to build Problem Solving Environments on the Grid as explained in Section 3.3 . The aim of the Diet project (http://graal.ens-lyon.fr/DIET ) is to develop a set of tools to build computational servers accessible through a GridRPC API.
Moreover, the aim of a Nes environment such as Diet is to provide a transparent access to a pool of computational servers. Diet focuses on offering such a service at a very large scale. A client which has a problem to solve should be able to obtain a reference to the server that is best suited for it. Diet is designed to take into account the data location when scheduling jobs. Data are kept as long as possible on (or near to) the computational servers in order to minimize transfer times. This kind of optimization is mandatory when performing job scheduling on a wide-area network.
Diet is built upon Server Daemons . The scheduler is scattered across a hierarchy of Local Agents and Master Agents . Network Weather Service (NWS) [117] sensors are placed on each node of the hierarchy to collect resource availabilities.
The different components of our scheduling architecture are the following. A Client is an application which uses Diet to solve problems. Many kinds of clients should be able to connect to Diet from a web page, a Problem Solving Environment such as Matlab or Scilab, or a compiled program. A Master Agent (MA) receives computation requests from clients. These requests refer to some Diet problems listed on a reference web page. Then the MA collects computational abilities from the servers and chooses the best one. The reference of the chosen server is returned to the client. A client can be connected to an MA by a specific name server or a web page which stores the various MA locations. Several MAs can be deployed on the network to balance the load among them. A Local Agent (LA) aims at transmitting requests and information between MAs and servers. The information stored on a LA is the list of requests and, for each of its subtrees, the number of servers that can solve a given problem and information about the data distributed in this subtree. Depending on the underlying network topology, a hierarchy of LAs may be deployed between an MA and the servers. No scheduling decision is made by a LA. A Server Daemon (SeD) encapsulates a computational server. For instance it can be located on the entry point of a parallel computer. The information stored on a SeD is a list of the data available on its server (with their distribution and the way to access them), the list of problems that can be solved on it, and all information concerning its load (available memory and resources, etc.). A SeD declares the problems it can solve to its parent LA. A SeD can give performance prediction for a given problem thanks to the CoRI module (Collector of Resource Information) [103] . Master Agents can then be connected over the net (Multi-MA version of Diet ), either statically or dynamically.
Moreover applications targeted for the Diet platform are now able to exert a degree of control over the scheduling subsystem via plug-in schedulers [103] . As the applications that are to be deployed on the Grid vary greatly in terms of performance demands, the Diet plug-in scheduler facility permits the application designer to express application needs and features in order that they be taken into account when application tasks are scheduled. These features are invoked at runtime after a user has submitted a service request to the MA, which broadcasts the request to its agent hierarchy.
Tools have recently been developed to deploy the platform (GoDiet ), to monitor its execution (LogService), and to visualize its behavior using Gantt graphs and statistics (VizDIET).
Seen from the user/developer point of view, the compiling and installation process of Diet should remain simple and robust. But Diet has to support this process for an increasing number of platforms (Hardware architecture, Operating System, C/C++ compilers). Additionally Diet also supports many functional extensions (sometimes concurrent) and many such extensions require the usage of one or a few external libraries. Thus the compilation and installation functionalities of Diet must handle a great number and variety of possible specific configurations. Up to the previous versions, Diet 's privileged tool for such a task were the so-called GNU-autotools. Diet 's autotools configuration files evolved to become fairly complicated and hard to maintain. Another important task for the packaging Diet is to assess that Diet can be properly compiled and installed at least for the most mainstream platforms and for a decent majority of all extension combinations. This quality assertion process should be realized with at least the frequency of the release. But, as clearly stated by the agile software development framework, the risk can be greatly reduced by developing software in short time-boxes (as short as a single cvs commit). For the above reasons, it was thus decided to move away from the GNU-autotools to cmake (refer http://www.cmake.org ). Cmake offers a much simpler syntax for its configuration files (sometimes at the cost of semantics, but cmake remains an effective trade-off). Additionally, cmake integrates a scriptable regression test tool whose reports can be centralized on a so called dashboard server. The dashboard offers a synthetic view (see http://graal.ens-lyon.fr/DIET/dietdashboard.html ) of the current state of Diet 's code. This quality evaluation is partial (compilation and linking errors and warnings) but is automatically and constantly offered to the developers. Although the very nature of Diet makes it difficult to carry distributed regression tests, we still hope that the adoption of cmake will significantly improve Diet 's robustness and general quality.
Diet has been validated on several applications. Some of them have been described in Sections 4.2 through 4.7 .
Workflow support
Workflow-based applications are scientific, data intensive applications that consist of a set of tasks that need to be executed in a certain partial order. These applications are an important class of Grid applications and are used in various scientific domains like astronomy or bioinformatics.
We have developed a workflow engine in Diet to manage such applications and propose to the end-user and the developer a simple way either to use provided scheduling algorithms or to develop their own scheduling algorithm.
There are many Grid workflow frameworks that have been developed, but Diet is the first GridRPC middleware that provides an API for workflow execution. Moreover, existing tools have limited scheduling capabilities. One of our objectives is to provide an open system which provides several scheduling algorithms, but also that allows users to plug and use their own specific schedulers.
In our implementation, workflows are described using the XML language. Since no standard exists for scientific workflows, we have proposed our formalism. The Diet agent hierarchy has been extended with a new special agent, the MA_DAG . To be flexible we can execute workflows even if this special agent is not present in the platform. The use of the MA_DAG centralizes the scheduling decisions and thus can provide a better scheduling when the platform is shared by multiple clients. On the other hand, if the client bypasses the MA_DAG , a new scheduling algorithm can be used without affecting the Diet platform. The current implementation of Diet provides several schedulers (Round Robin, HEFT, random, Fairness on finish Time, etc.).
The Diet workflow runtime also includes a rescheduling mechanism. Most workflow scheduling algorithms are based on performance predictions that are not always accurate (erroneous prediction tool or resource load wrongly estimated). The rescheduling mechanism can trigger the application rescheduling when some conditions specified by the client are filled.
We also continued our work on schedulers for Diet workflow engine concerning multi-workflows based applications, and graphical tools for workflows within the Diet DashBoard project. Within the Gwendia project, we worked on the implementation of the language defined in the project and around the Cardiac application. Experiments were done over the Grid'5000 platform.
Batch and parallel job management
Generally, the use of a parallel computing resource is done through a batch reservation system. Users wishing to submit parallel tasks have to write scripts which notably describe the number of required nodes and the walltime of the reservation. Once submitted, a script is processed by the batch scheduling algorithm: the user is answered the starting time of its job, and the batch system records the dedicated nodes (the mapping ) allocated to the job.
In the Grid context, there is consequently a two-level scheduling: one at the batch level and the other one at the Grid middleware level. In order to efficiently exploit the resource (according to some metrics), the Grid middleware should map the computing tasks according to the local scheduler policy. This also supposes that the middleware integrates some mechanisms to submit to parallel resources, and that, during the submission, it provides information like the number of demanded resources, the job deadline, etc.
Diet servers are able to transparently submit tasks to parallel resources, via a batch system or not. For the moment, Diet servers can submit to the version 1.6 and 2.X of OAR, OpenPBS and Loadleveler reservation systems, the latter being used in the Décrypthon project. The implementation of the integration of SGE is in progress. Functions to access batch system information have also been implemented in order to use them both as scheduling metric and to tune parallel and moldable tasks.
DIET Data Management
DAGDA, designed during the PhD of Gaël Le Mahec, is a new data manager for the Diet middleware which allows data explicit or implicit replications and advanced data management on the grid. It was designed to be backward compatible with previously developed applications for Diet which benefit transparently of the data replications. It allows explicit or implicit data replications, file sharing between the nodes which can access to the same disk partition, the choice of a data replacement algorithm, and a high level configuration about the memory and disk space Diet should use for the data storage and transfers.
To transfer a data, DAGDA uses the pull model instead of the push model used by DTM. The data are not sent into the profile from the source to the destination, but they are downloaded by the destination from the source. DAGDA also chooses the best source for a given data.
DAGDA has also been used for the validation of our join replication and scheduling algorithms over Diet .
GridRPC Data Management API
Data Management is a challenging issue inside the OGF GridRPC standard, for performance reasons. Indeed some temporarily data do not need to be transfered once computed and can reside on servers for example. We can also imagine that data can be directly transferred from one server to another one, without being transferred to the client in accordance to the GridRPC paradigm behavior.
We have consequently worked on a Data Management API which has been presented to all OGF sessions since OGF'21. Since december 2009, the proposal is available for public comment and may be reached at: http://www.ogf.org/gf/docs/?public_comment under the title “Proposal for a Data Management API within the GridRPC. Y. Caniou and others, via GRIDRPC-WG”.
DIET Dashboard
When the purpose is to monitor a Grid, or deploy a Grid middleware on it, several tasks are involved in the process. Managing the resources of a Grid: allocating resources, deploying nodes with defined operating systems, etc. Monitoring the Grid: getting the status of the clusters (number of available nodes in each state, number and main properties of each job, Gantt chart of the jobs history), the status of the jobs (number, status, owner, walltime, scheduled start, ganglia information of the nodes) present in the platform, etc. Managing Grid middleware in Grid environment: designing hierarchies (manually or automatically by matching resources on patterns), deploying them directly or through workflows of applications, etc.
The Diet Dashboard provides tools trying to answer these needs with an environment dedicated to the GridRPC middleware Diet and it consists of a set of graphical tools that can be used separately or together.
These tools can be divided in three categories:
- Diet tools
including tools to design and deploy Diet applications. The Diet Designer allows users to graphically design a Diet hierarchy. The Diet Mapping tool allows users to map the allocated Grid'5000 resources to a Diet application. The mapping is done in an interactive way by selecting the site then Diet agents or SeDs. And the Diet Deploy tool is a graphical interface to GoDiet intended for the deployment of Diet hierarchies.
- Workflow tools
including workflow designer and workflow log service. The Workflow designer is dedicated to workflow applications written in Diet . It gives users an easy way to design and execute workflows. The user can compose the available services and link them by drag-and-drop or load a workflow description file in order to reuse it. Finally it can be directly executed online. The Workflow LogService can be used to monitor workflows execution by displaying the DAG nodes of each workflow and their states.
- Grid tools (aka GRUDU).
These tools are used to manage, monitor, and access user Grid resources. Displaying the status of the platform: this feature provides information about clusters, nodes and jobs. Resource allocation: this feature provides an easy way to allocate resources by selecting from a Grid'5000 map the number of required nodes and defining time. The allocated resources can be stored and used with Diet mapping tool. Resource monitoring through the use of the Ganglia plugin that provides low-level information on every machines of a site (instantaneous data) or on every machines of a job (history of the metrics). Deployment management with a Gui for KaDeploy simplifying its use. A terminal emulator for remote connections to Grid'5000 machines and a File transfer manager to send/receive files to/from Grid'5000 frontends.
As the Grid tools can be a powerful help for the Grid'5000 users, these have been extracted to create GRUDU (Grid'5000 Reservation Utility for Deployment Usage) which aims at simplifying the access and the management of Grid'5000.
Middleware Interoperability
For the requirements of the GridTLSE project, Diet has been extended with a protocol interoperability with the ITBL middleware which manages Japanese computing resources in the JAEA (Japan Atomic Energy Agency). A demo has been presented in the INRIA booth at SuperComputing'08 and SuperComputing'09.
DIET as a Cloud System
A new extension of DIET was designed to deal with Cloud platforms such as Amazon EC2. We proposed the use of the Diet Grid middleware on top of the Eucalyptus Cloud system to demonstrate general purpose computing using Cloud platforms. DIET is now compliant with the Amazon EC2 API. These recent developments validate the use of a Cloud system as a raw computational on-demand resource for a Grid middleware such as DIET.