Section: Scientific Foundations
Providing Access to HPC Servers on the Grid
Participants : Nicolas Bard, Julien Bigot, Raphaël Bolze, Hinde Bouziane, Yves Caniou, Eddy Caron, Aurélien Ceyden, Ghislain Charrier, Benjamin Depardon, Frédéric Desprez, Gilles Fedak, Jean-Sébastien Gay, Haiwu He, David Loureiro, Christian Pérez, Vincent Pichon, Cédric Tedeschi, Bing Trang.
Resource management is one of the key issues for the development of efficient Grid environments. Several approaches co-exist in today's middleware platforms. The computation (or communication) grain and the dependences between the computations also have a great influence on the software choices.
A first approach provides the user with a uniform view of resources. This is the case of GLOBUS (http://www.globus.org/ )which provides transparent MPI communications (with MPICH-G2) between distant nodes but does not manage load balancing issues between these nodes. It is the user's task to develop a code that will take into account the heterogeneity of the target architecture. Classical batch processing can also be used on the Grid with projects like Condor-G (http://www.cs.wisc.edu/condor/condorg/ )or Sun GridEngine (http://wwws.sun.com/software/gridware/ ). Finally, peer-to-peer  or Global computing  can be used for fine grain and loosely coupled applications.
A second approach provides a semi-transparent access to computing servers by submitting jobs to dedicated servers. This model is known as the Application Service Provider (ASP) model where providers offer, not necessarily for free, computing resources (hardware and software) to clients in the same way as Internet providers offer network resources to clients. The programming granularity of this model is rather coarse. One of the advantages of this approach is that end users do not need to be experts in parallel programming to benefit from high performance parallel programs and computers. This model is closely related to the classical Remote Procedure Call (RPC) paradigm. On a Grid platform, the RPC (or GridRPC  ,  ) offers an easy access to available resources to a Web browser, a Problem Solving Environment, or a simple client program written in C, Fortran, or Java. It also provides more transparency by hiding the search and allocation of computing resources. We favor this second approach.
In a Grid context, this approach requires the implementation of middleware environments to facilitate the client access to remote resources. In the ASP approach, a common way for clients to ask for resources to solve their problem is to submit a request to the middleware. The middleware will find the most appropriate server that will solve the problem on behalf of the client using a specific software. Several environments, usually called Network Enabled Servers ( Nes ), have developed such a paradigm: NetSolve  , Ninf  , NEOS  , OmniRPC  , and more recently Diet developed in the Graal project (see Section 5.1 ). A common feature of these environments is that they are built on top of five components: clients, servers, databases, monitors, and schedulers. Clients solve computational requests on servers found by the Nes . The Nes schedules the requests on the different servers using performance information obtained by monitors and stored in a database.
To design such a Nes we need to address issues related to several well-known research domains. In particular, we focus on:
middleware and application platforms as a base to implement the necessary “glue” to broke clients requests, find the best server available, and then submit the problem and its data,
online and offline scheduling of requests,
link with data management,
distributed algorithms to manage the requests and the dynamic behavior of the platform.