Section: Scientific Foundations
Network services for high demanding applications
Participants : Pascale Vicat-Blanc Primet, Olivier Glück, Laurent Lefèvre, Jean-Patrick Gelas, Paulo Gonçalves, Lucas Nussbaum, Patrick Loiseau, Olivier Mornard, Sébastien Soudan, Ludovic Hablot, Romaric Guillier, Manoj Dahal, Aurélien Cedeyn, Oana Goga, Armel Soro.
The purpose of Computational Grids was initially to aggregate a large collection of shared resources (computing, communication, storage, information) to build an efficient and very high performance computing environment for data-intensive or computing-intensive applications  . But generally, the underlying communication infrastructure of these large scale distributed environments is a complex interconnection of multi-IP domains with non controlled performance characteristics. Consequently the Grid Network cloud exhibits extreme heterogeneity in performance and reliability that considerably affect the global application performance.
The performance problem of the grid network cloud can be studied from different but complementary view points.
Measuring and monitoring the end-to-end performance helps to characterize the links and the network behavior. Network cost functions and forecasts, based on such measurement information, allow the upper abstraction level to build optimization and adaptation algorithms.
Optimally using network services provided by the network infrastructure for specific grid flows is of importance.
Modeling, managing and controlling the grid network resource as a first class resource of the global environment: transfer scheduling, data movement balancing, bandwidth reservation and dynamic provisioning...
Creating enhanced and programmable transport protocols adapted to heterogeneous data transfers within the grid may offer a scalable and flexible approach for performance control and optimization.
In a grid environment, two key points in the communication layers need to be taken in consideration in order to execute efficiently high performance applications: the heterogeneity of high-speed interconnects composing the grid and the Wide Area Network used to achieve inter-site communications. We explore new mechanisms to improve the application performance when it executes on the grid. We study, in particular, how a MPI application can benefit, during one execution, of several high-speed networks at the same time. In particular, it implies to find a way to communicate efficiently between these heterogeneous interconnections. We also explore how to keep good performance execution when long-distance communications are necessary because the application is launched on multiple sites of the grid.
An efficient MPI implementation for the grid is one of our research topic in this axis with the aim of improving communications in grid environments. The MPI standard is often used in parallel applications for communication needs. Most of them are designed for homogeneous clusters, but MPI implementations for grids have to take into account the heterogeneity of high-speed interconnects composing the grid and the Wide Area Network used to achieve inter-site communications, in order to maintain a high performance level. These two constraints are not considered together in existing MPI implementations, and raise the question of MPI efficiency in grids. Our goal is to significantly improve the performance execution of MPI applications on the grid.
Finally, the resource mutualisation and sharing paradigm proposed by the Grid remains a very promising and powerful concept that we apply to network resource sharing at many levels. To explore new approaches or difficult problems alike, we design and deploy special shared network resource at the edge of Grid5000 sites  . The goal is develop "proof of concept" experiments for exploring, among others, traffic awareness, the buffer sizing problem, buffer and filtering "in route" approaches, router virtualization, multipath routing, and router assisted transport protocols and communication libraries (MPI5000).