Team GRAAL

Members
Overall Objectives
Scientific Foundations
Application Domains
Software
New Results
Other Grants and Activities
Dissemination
Bibliography

Section: Software

DIET

Participants : Abdelkader Amar, Nicolas Bard, Raphaël Bolze, Yves Caniou, Eddy Caron, Ghislain Charrier, Frédéric Desprez [ correspondent ] , Jean-Sébastien Gay, Vincent Pichon, Cédric Tedeschi.

Huge problems can now be processed over the Internet thanks to Grid Computing Environments like Globus or Legion. Because most of the current applications are numerical, the use of libraries like BLAS, LAPACK, ScaLAPACK, or PETSc is mandatory. The integration of such libraries in high level applications using languages like Fortran or C is far from being easy. Moreover, the computational power and memory needs of such applications may of course not be available on every workstation. Thus, the RPC paradigm seems to be a good candidate to build Problem Solving Environments on the Grid as explained in Section  3.3 . The aim of the Diet ( http://graal.ens-lyon.fr/DIET ) project is to develop a set of tools to build computational servers accessible through a GridRPC API.

Moreover, the aim of a Nes environment such as Diet is to provide a transparent access to a pool of computational servers. Diet focuses on offering such a service at a very large scale. A client which has a problem to solve should be able to obtain a reference to the server that is best suited for it. Diet is designed to take into account the data location when scheduling jobs. Data are kept as long as possible on (or near to) the computational servers in order to minimize transfer times. This kind of optimization is mandatory when performing job scheduling on a wide-area network.

Diet is built upon Server Daemons . The scheduler is scattered across a hierarchy of Local Agents and Master Agents . Network Weather Service (NWS)  [74] sensors are placed on each node of the hierarchy to collect resource availabilities, which are used by an application-centric performance prediction tool named FAST.

The different components of our scheduling architecture are the following. A Clientis an application which uses Diet to solve problems. Many kinds of clients should be able to connect to Diet from a web page, a Problem Solving Environment such as Matlab or Scilab, or a compiled program. A Master Agent (MA)receives computation requests from clients. These requests refer to some Diet problems listed on a reference web page. Then the MA collects computational abilities from the servers and chooses the best one. The reference of the chosen server is returned to the client. A client can be connected to an MA by a specific name server or a web page which stores the various MA locations. Several MAs can be deployed on the network to balance the load among them. A Local Agent (LA)aims at transmitting requests and information between MAs and servers. The information stored on a LA is the list of requests and, for each of its subtrees, the number of servers that can solve a given problem and information about the data distributed in this subtree. Depending on the underlying network topology, a hierarchy of LAs may be deployed between an MA and the servers. No scheduling decision is made by a LA. A Server Daemon (SeD)encapsulates a computational server. For instance it can be located on the entry point of a parallel computer. The information stored on a SeD is a list of the data available on its server (with their distribution and the way to access them), the list of problems that can be solved on it, and all information concerning its load (available memory and resources, etc.). A SeD declares the problems it can solve to its parent LA. A SeD can give performance prediction for a given problem thanks to the CoRI module (Collector of Resource Information)  [58] .

Moreover applications targeted for the Diet platform are now able to exert a degree of control over the scheduling subsystem via  plug-in schedulers   [58] . As the applications that are to be deployed on the Grid vary greatly in terms of performance demands, the Diet plug-in scheduler facility permits the application designer to express application needs and features in order that they be taken into account when application tasks are scheduled. These features are invoked at runtime after a user has submitted a service request to the MA, which broadcasts the request to its agent hierarchy.

Master Agents can then be connected over the net (Multi-MA version of Diet ), either statically of dynamically

Thanks to a collaboration between the Graal and PARIS projects, Diet can use JuxMem . JuxMem (Juxtaposed Memory) is a peer-to-peer architecture developed by the PARIS team which provides memory sharing services allowing peers to share memory data, and not only files. To illustrate how a GridRPC system can benefit from transparent access to data, we have implemented the proposed approach inside the Diet GridRPC middleware, using the JuxMem data-sharing service.

Tools have recently been developed to deploy the platform (Go Diet ), to monitor its execution (LogService), and to visualize its behavior using Gantt graphs and statistics (VizDIET).

Seen from the user/developer point of view, the compiling and installation process of Diet should remain simple and robust. But Diet has to support this process for an increasing number of platforms (Hardware architecture, Operating System, C/C++ compilers). Additionally Diet also supports many functional extensions (sometimes concurrent) and many such extensions require the usage of one or a few external libraries. Thus the compilation and installation functionalities of Diet must handle a great number and variety of possible specific configurations. Up to the previous versions, Diet 's privileged tool for such a task were the so-called GNU-autotools. Diet 's autotools configuration files evolved to become fairly complicated and hard to maintain. Another important task of the packager-person of Diet is to assess that Diet can be properly compiled and installed at least for the most mainstream platforms and for a decent majority of all extension combinations. This quality assertion process should be realized with at least the frequency of the release. But, as clearly stated by the agile software development framework, the risk can be greatly reduced by developing software in short time-boxes (as short as a single cvs commit). For the above reasons, it was thus decided to move away from the GNU-autotools to cmake (refer http://www.cmake.org ). Cmake offers a much simpler syntax for its configuration files (sometimes at the cost of semantics, but cmake remains an effective trade-off). Additionally, cmake integrates a scriptable regression test tool whose reports can be centralized on a so called dashboard server. The dashboard offers a synthetic view (see http://graal.ens-lyon.fr/DIET/dietdashboard.html ) of the current state of Diet 's code. This quality evaluation is partial (compilation and linking errors and warnings) but is automatically and constantly offered to the developers. Although the very nature of Diet makes it difficult to carry distributed regression tests, we still hope that the adoption of cmake will significantly improve Diet 's robustness and general quality.

Diet has been validated on several applications. Some of them have been described in Sections  4.2 through  4.7 .

Workflow support

Workflow-based applications are scientific, data intensive applications that consist of a set of tasks that need to be executed in a certain partial order. These applications are an important class of Grid applications and are used in various scientific domains like astronomy or bioinformatics.

We have developed a workflow engine in Diet to manage such applications and propose to the end-user and the developer a simple way either to use provided scheduling algorithms or to develop their own scheduling algorithm.

There are many Grid workflow frameworks that have been developed, but Diet is the first GridRPC middleware that provides an API for workflow applications execution. Moreover, existent tools have limited scheduling capabilities, and one of our objectives is to provide an open system which provides several scheduling algorithms, but also that permits to the users to plug and use their own specific schedulers.

In our implementation, workflows are described using the XML language. Since no standard exists for scientific workflows, we have proposed our formalism. The Diet agent hierarchy has been extended with a new special agent, the MA_DAG , but to be flexible we can execute workflow even if this special agent is not present in the platform. The use of the [ MA_DAG] centralizes the scheduling decisions and thus can provide a better scheduling when the platform is shared by multiple clients. On the other hand, if the client bypasses the MA_DAG , a new scheduling algorithm can be used without affecting the Diet platform. The current implementation of Diet provides several schedulers (Round Robin, HEFT, random, Fairness on finish Time, etc.).

The Diet workflow runtime also includes a rescheduling mechanism. Most workflow scheduling algorithms are based on performance predictions that are not always exact (erroneous prediction tool or resource load wrongly estimated). The rescheduling mechanism can trigger the application rescheduling when some conditions specified by the client are filled.

This year, more developments were performed to improve DIET workflow engine especially considering multi-workflow based applications. The previous support allows to manage multiple workflow submissions but each submitted workflow was scheduled alone. To study the behaviour of workflow scheduling, another approach was used that consider the submitted workflow with all other waiting tasks of previous submitted workflows to compute a new scheduling. A first implementation was realized by using monitoring features and mechanisms to make different workflows respect new scheduling decisions, but it was insufficient in a real concurrency environment. The second implementation that corrects this drawback, uses a centralized scheduler in the MA_DAG (like in the first implementation) but also a minimal runtime to execute active workflows. This minimalist runtime don't really execute tasks (to minimize MA_DAG load ) but trigger the corresponding clients to execute them, so scheduling decisions can be respected since scheduling and tasks execution start are done in a centralised way.

In addition to these developments, graphical tools for workflows (Workflow designer and Workflow log service) were developed in DIET DashBoard project.

DAGDA: A new data manager for the DIET middleware

"Data Arrangement for Grid and Distributed Applications" is a new data manager for the Diet middleware. Indeed, the previous data manager could not manage data replication and explicit data redistribution among the nodes. We developed this new data manager which allows to control the data placement and which manages several replica for a given data. This data manager is backward compatible with existing Diet applications and some possible extensions such as data encryption, compression etc. should be easy to implement. Dagda is using the pull model for the data management. That means data is not sent with the request as with the previous data manager, but asked by the server when it needs them. This system gives more flexibility to the data management with, for example, the possibility to download a data from several sources simultaneously. We will use this new data manager to evaluate some data replication scheduling algorithms with Diet .

Batch and parallel job management

Currently, Grids are built on a cluster hierarchy model, as used by the two projects Egee   (http://public.eu-egee.org/ )( Enabling Grids for E-science in Europe ) and Grid'5000 (see Section  8.2 ). The production platform for the Egee project aggregates more than one hundred sites spread over 31 countries. Grid'5000 is the French Grid for the research, which aims to own 5000 nodes spread over France (9 sites are currently participating).

Generally, the use of a parallel computing resource is done via a batch reservation system: users wishing to submit parallel tasks to the resource have to write scripts which notably describe the number of required nodes and the walltime of the reservation. Once submitted, the script is processed by the batch scheduling algorithm: the user is answered the starting time of its job, and the batch system records the dedicated nodes ( the mapping ) allocated to the job.

In the Grid context, there is consequently a two-level scheduling: one at the batch level and the other one at the Grid middleware level. In order to efficiently exploit the resource (according to some metrics), the Grid middleware should map the computing tasks according to the local scheduler policy. This also supposes that the middleware integrates some mechanisms to submit to parallel resources, and provides during the submission information like the number of demanded resources, the job deadline, etc.

First, we have extended the Diet functionalities. Diet servers are now able to submit tasks to parallel resources, via a batch system or not. For the moment, Diet servers can submit to both OAR and Loadleveler reservation systems, the latter being used in the Décrypthon project. Furthermore, a Diet client can specify if its job must be considered specifically for the corresponding type of resource (sequential task to sequential resource) or if Diet has in charge to choose the best among all available resources. In consequence, the API has been extended with two new calls on the client side, and several new functionalities on the server side: we provide an abstraction layer to batch systems to make reservation information available to the SeD . For example, a parallel MPI program must know the identity of the machines on which it is deployed. These are generally reported in a file, which is specific to each batch system. Using a given keyword provided by Diet (here DIET_BATCH_NODELIST ), the program can access the needed information.

DIET Dashboard

When the purpose is to monitor a Grid, or deploy a Grid middleware on it, several tasks are involved in the process:

The Diet Dashboard provides tools trying to answer these needs with an environment dedicated to the GridRPC middleware Diet and it consists of a set of graphical tools that can be used separately or together.

These tools can be divided in three categories:

Diet tools including tools to design and deploy Diet applications. The Diet Designer allows to the user to design graphically a Diet hierarchy. The Diet Mapping tool allows the user to map the allocated Grid'5000 resources to a Diet application. The mapping is done in an interactive way by selecting the site then Diet agents or SeD. And the Diet Deploy tool is a graphical interface to Go Diet intended for the deployment of Diet hierarchies.

Workflow tools including workflow designer and workflow log service. The Workflow designeris dedicated to workflow applications written in Diet provide to the user an easy way to design and execute workflows with Diet . The user can compose the different available services and link them by drag'n'drop or load a workflow description file in order to reuse it. Finally these can be directly executed online. The Workflow LogServicecan be used to monitor workflows execution by displaying the DAG nodes of each workflow and their states.

Grid tools (aka GRUDU). These tools are used to manage, monitor and access user Grid resources. Displaying the status of the platform: this feature provides informations about clusters, nodes and jobs. Resources allocation: this feature provides an easy way to allocate resources by selecting in a Grid'5000 map the number of required nodes and defining time. The allocated resources can be stored and used with Diet mapping tool. Resources monitoringthrough the use of the Ganglia plugin that gives you low-level information on every machines of a site (instantaneous data) or on every machines of a job (history of the metrics). Deployment managementwith a Gui for KaDeploy simplifying its use. A terminal emulatorfor remote connections to Grid'5000 machines and a File transfer manager to send/receive files to/from Grid'5000 frontends.

As the Grid tools can be a powerful help for the Grid'5000 users, these have been extracted to create GRUDU (Grid'5000 Reservation Utility for Deployment Usage) which aims at simplifying the access and the management of Grid'5000.

All these tools have been presented at the SuperComputing 2007 conference in Reno, Nevada on the Diet slot of the Inria booth.

GridRPC data management API

The GridRPC paradigm is now an OGF standard. The GridRPC community has interests in the Data Management within the GridRPC paradigm. Because of previous works performed in the Diet middleware concerning Data Management, Eddy Caron has been promoted co-chair of the GridRPC working group in order to lead the project to propose a powerful Grid Data Management API which can extend the GridRPC paradigm.

Data Management is a challenging issue inside the GridRPC for performance reasons. Indeed some temporarily data do not need to be transfered once computed and can reside on servers for example. We can also imagine that data can be directly transferred from one server to another one, without being transferred to the client in accordance to the GridRPC paradigm behavior.

We have consequently worked on a Data Management API which has been presented during the OGF'21. We are currently improving it, remarks having already been taken into account. The new proposal will be presented during the OGF'22.


previous
next

Logo Inria