Project : paris
Section: New Results
Middleware for computational grids
The PadicoTM framework
Computational grids differ from previous computing infrastructure as they exhibit parallel and distributed aspects: a computational grid is a set of various and widely distributed computing resources, which are often parallel. Therefore, a grid usually contains various networking technologies — from system area network through wide area network.
PadicoTM is a communication framework that decouples application middleware systems from the actual networking environment. Hence, applications become able to transparently and efficiently utilize any kind of communication middleware (either parallel or distributed-based) on any network that they are deployed on. Moreover, to support advanced grid programming models, PadicoTM is able to concurrently support several communication middleware systems.
PadicoTM achieves these functionalities by implementing a dual-abstraction model which is organized in 3 layers: arbitration, abstraction, and personalities. The two paradigms, parallel and distributed, are present at each level. Therefore, cross-paradigm translation is performed only when required (i.e. distributed middleware atop parallel hardware or parallel middleware atop distributed networks) with no bottleneck of features. The lowest level layer is the arbitration layer. It goal is to provide arbitrated interfaces, i.e. a consistent, reentrant and multiplexed access to every networking resources, each resource is utilized with the most appropriate driver and method. On top of the arbitration layer, an abstraction layer aims at providing abstract interfaces well suited for their use by various middleware systems, independently from the hardware. The abstract layer should be fully transparent: the interfaces are the same whatever the underlying network is. The last layer is a personality layer which is able to supply various standard APIs on top of the abstract interfaces. Personalities are thin wrappers which adapt a generic API to make it look like another API. They do no protocol adaptation nor paradigm translation; they only adapt the syntax.
During the year 2003, we have finalized the specification of the three-layer dual-abstraction model and have modified PadicoTM accordingly. The arbitration layer in PadicoTM is called NetAccess, which contains two subsystems: SysIO for access to system I/O (sockets, files), and MadIO, based on Madeleine, for multiplexed access to high-performance networks. A core handles a consistent interleaving among the concurrent polling loops. NetAccess is open enough so as to allow the integration of other subsystems beside MadIO and SysIO for other paradigms such as Shmem SMP nodes for example. The two abstract interfaces in PadicoTM are VLink for distributed computing, and Circuit for parallelism. The VLink interface has been implemented on top of several drivers: MadIO, SysIO, Parallel Streams, AdOC (a dynamic adaptive compression library) and loopback. The Circuit interface, which manages communications on a definite set of nodes, has been implemented on top of MadIO, SysIO, loopback and VLink. As a given instance of Circuit can use different adapters for different links, it is possible to build a circuit using different kinds of communication.
Parallel CORBA objects
The concept of (distributed) parallel object appears to be a key technology for programming (distributed) numerical simulation. It joins the well known object oriented model with a parallel execution model. Hence, a data distributed across a parallel object can be sent and/or received almost like a regular piece of data while taking advantage of (possible) multiple communication flows between the parallel sender and receiver. The Paris Project-Team has been working on such a topic for several years. PaCO was the first attempt to extend CORBA with parallelism. PaCO++ is a second attempt that supersedes PaCO in several points. It targets a portable extension to CORBA so that it can be added to any implementation of CORBA. It advocates the parallelism of an object is mainly an implementation issue: it should not be visible to users but in some special occasions. Hence, the Omg Idl is no longer modified.
In 2002, the development of PaCO++ was started to validate the portable parallel CORBA object concept. A first implementation was done which was validated with an EADS application that manipulated block-cyclic distributed matrices of complex number. It was composed of an Idl-toIdl compiler and a (C++) runtime part. Both tools only required a compliant CORBA implementation. Moreover, the PaCO++ runtime handled threads, intra-parallel object communications and data distributions thanks to abstract interfaces.
Continuing the development effort, we have worked on stabilizing the code of PaCO++ so that to be able to deliver it to our partners in the HydroGrid ACI GRID project and to the ReMaP/GRAAL research team (LIP, Lyon, France). We plan to release a public version during the winter 2003–2004.
The data distributed abstraction of PaCO++ was quite primitive. Thanks to the support of the Inria ARC RedGrid, we are working on refining the role of the PaCO++ runtime. An important motivation was to be able to integrate a communication scheduling functionality to take into account the capabilities of the underlying networks. We succeed in integrating the scheduling library done by the Algorille research team (Nancy, France) into an experimental prototype. Its communication performance does not decrease when the number of sender increases contrary to other prototypes that face network congestion when the number of sender becomes too large with respect to the WAN bandwidth.
Another issue with parallel object we have started to study concerns the management of exception. The problem is to define the semantic of an exception raised during the invocation of a parallel operation invocation. We have identified various scenarios (a single exception, several identical exceptions, several different exceptions, etc) and the semantic that can be associated to them (standard exception mechanism, group of exceptions, priority of exceptions, etc). It is still on going work.
Future work can be divided into three parts. First, we will continue the development of PaCO++ with the integration of a support for communication scheduling as well as a more open model for distributed data. Second, the work on parallel exceptions will be finished and implemented into PaCO++. Last, we would like to show the concept of parallel object can be applied to other technologies than CORBA, like for example Web Services or Peer-to-Peer.
Parallel CORBA components
Software component technology represents the next attempt to deal with the complexity of software programming. CORBA component model (Ccm) is the Omg standard for component-based distributed programming. Like CORBA objects, CORBA components suffer limitation with respect to parallelism. Our goal being to study the concept of parallel component, Ccm appears to be a reasonable technological choice as it specifies the whole life cycle of a component, including its packaging and its deployment.
We have proposed to define a parallel component as a collection of identical sequential components that executes all or some parts of its services in parallel. This definition allows us to apply the experience acquired with PaCO++. The Omg Idl3, which is the component abstract view, does not need to be modified. The parallelism, specified into an auxiliary file, can be attached to the implementation definition language (Cidl). Hence, it should be possible to implement parallel components as an extension of Ccm implementations.
To evaluate the pertinence of our parallel component definition, we have implemented two prototypes of parallel CORBA components based on two preliminary Ccm implementations: OpenCCM, a Java implementation, and MicoCCM, a C++ implementation. For both prototypes, the definition of parallel component was pertinent as no particular difficulty was found. Moreover, both prototypes perform as expected. The latency measurements do not show any significant overhead with respect to the latency of the plain Ccm implementation. The bandwidth was correctly aggregated: it grows from 9.8 MB/s for the C++ implementation (resp. 8.3 MB/s for the Java implementation) for a one-node to one-node parallel component configuration to 78.4 MB/s (resp. 66.4 MB/s) for a 8-node to 8-node parallel component configuration. These number have been obtained using a Fast-Ethernet network for CORBA communications.
When using PadicoTM to route CORBA communications through a Myrinet network, the bandwidth for the C++ version scales from 43 MB/s (one-node to one-node) to 280 MB/s (8-node to 8-node). These numbers are better than for a Fast-Ethernet network. However, there are not very good with respect to the performance of the Myrinet network. As we have shown with the PadicoTM experiments, the problem lies in the data copies generated by Mico which limit the bandwidth. High performance parallel components, like high performance parallel objects, require a high performance CORBA implementation. OmniORB is such an implementation for CORBA objects. Such a CORBA component implementation is still missing.
In the future, we will focus on the relationship between parallel components and the Ccm containers. It seems parallel versions of existing containers should be define in order to add parallelism support to container operations.
Another direction concerns the adaptability of parallel components to their environment. An example of adaptation is a modification in the number of components that belong to a parallel component. Adaptability appears as a new type of service brought by parallel containers.
Last, we target to develop a fully operational prototype called GridCCM which will extend Ccm with parallel components. It will be based on PaCO++ and some tools like a compiler for Idl3 transformation and new code generators.
Parallel component deployment for computational grids
The deployment of parallel component based applications is a critical issue in the utilization of computational Grids. It consists in selecting a number of nodes and in launching the application on them. We have started to work on an accurate description of the resources. Previous work succeeds in describing properly the compute nodes (CPU speed, memory size, operating system, etc), but generally fails to describe the network topology and its characteristics in a simple, synthetic and complete way. We have proposed a description model for grid networks. This model provides a synthetic view of the network topology. In particular, it is able to describe non-hierarchical topologies. It is also a simple, namely thanks to the possibility for a network group to inherit properties from its parent network groups. However, this simplicity does not hinder the description of complex network topologies (asymmetric links, firewalls, non-IP networks, non-hierarchical topologies). Finally, our description model aims to be complete by specifying the necessary information about the software available to access particular network technologies. The proposed model has been successfully integrated into the MDS2 of Globus. It has mainly consisted in defining around 40 LDAP schema entries that describes natures of networks, lists of open or closed ports, average latencies and bandwidths, network software information, etc.
We foresee two complementary future work. On one hand, we have to specify the interactions between a Grid middleware such as OGSA and the CORBA component model. On the other hand, we have to integrate our parallel component model to some planner tools such as Sekitei.
In the area of wireless computing where resources are a key issue, many techniques of dynamic adaptation have been developed: from the observation of the environment, codes can adapt their behavior to fit the resource constraints. An efficient way to allow an application to evolve according to its environment is to provide mechanisms that permit dynamic self-adaptation by changing the behavior depending on the currently available resources. Since Grid architectures are also known to be highly dynamic, using resources efficiently on such architectures is a challenging problem too. Software must be able to dynamically react to the changes of the underlying execution environment.
In order to help developers to create reactive software for the Grid, we are investigating a model for the adaptation of parallel components.
We have combined a dynamic adaptation framework with parallelism and distribution allowing its use for Grid programming. Our prototype is built using the ACEEL adaptation engine built for wireless and mobile environments. Our tool takes into account the parallelism that can reside in applications. We have defined a parallel self-adaptable component as a component composed of several processes working together which is able to change its behavior according to the changes of the environment. The structure of such a component includes an adaptation policy, a set of available implementations, called behaviors, and a set of reaction steps. Reaction steps are the means by which the component adapts itself. It can be for example the replacement of the active behavior, the tuning of some parameters, the redistribution of arrays. The platform we have built mainly provides two kinds of objects: the decider and the coordinators. The decider is the object that makes the decisions: it decides when (events to watch) and how (reactions to execute) the component should adapt itself according to the adaptation policy. The coordinators execute the directives given by the decider: they serve as intermediaries between the code of the component and the platform. Their role is to synchronize the adaptation mechanism with the functional code and to coordinate the execution of the reactions.
We plan to define more formally the properties that the component is required to satisfy to adapt itself. This includes the properties of global states where an adaptation can occur and the constraints on behavior replacement. Studying the relationship between fault tolerance systems that use checkpointing and adaptation in the context of Grid computing is an important perspective too. Finding shared properties between checkpoints and adaptation points would be of great help in establishing properties and constraints on adaptation point placement.
Our long term goal is to build a generic platform to develop parallel adaptable components for the Grid. This platform would include both the toolbox to build parallel self-adaptable components and their runtime environment. Such a platform should ease the building of efficient applications for Grid architectures.