Team ScAlApplix

Overall Objectives
Scientific Foundations
Application Domains
New Results
Contracts and Grants with Industry
Other Grants and Activities

Section: New Results

Keywords : parallel algorithms, heterogeneous platforms, master slave tasking, scheduling, divisible tasks, communication/computation overlapping, collective communications.

Parallel algorithms for heterogeneous platforms

Participants : Olivier Beaumont, Miroslaw Korzeniowski.

As already mentioned in section  3.4 , makespan minimization turns out to be very difficult, even for simple homogeneous processors and links. Our objective is to lower the ambition of makespan minimization in order to build efficient scheduling algorithms for more realistic platform models. In our works, we usually adopt the so-called ``one-port with overlap model'', where a processor can simultaneously send one message, receive one message, process one task, and contentions over communication links are taken into account. This requires a fine knowledge of the topology of the platform, but recently, some tools (like ENV and AlNEM) have been designed to build such platform models. An idea to circumvent the difficulty of makespan minimization is to lower the ambition of the scheduling objective. Instead of aiming at the absolute minimization of the execution time, why not consider asymptotic optimality ? After all, the number of tasks to be executed on the computing platform is expected to be very large: otherwise why deploy the corresponding application on computational grids ? This approach has been pioneered by Bertsimas and Gamarnik. The dramatic simplification of steady-state scheduling is to concentrate on steady-state operations ! The scheduling problem is relaxed in many ways. Initialization and clean-up phases are neglected. The initial integer formulation is replaced by a continuous or rational formulation. The precise ordering and allocation of tasks and messages are not required, at least in the first step. The main idea is to characterize the activity of each resource during each time-unit: which (rational) fraction is spent computing, which is spent receiving or sending to which neighbor. Such activity variables are gathered into a linear program, which includes conservation laws that characterize the global behavior of the system.

This approach has been applied with success to many scheduling problems. We have first considered very simple application models, such as master-slave tasking, where a processor initially holds all the data, and the makespan minimization counterpart has been studied. Generalizations, when some parallelism can be extracted within tasks, have been considered and the general case has been proven NP-Hard. The case of divisible tasks (perfect parallel tasks that can be arbitrarily divided) has been addressed in [44] in the case where return messages must be taken into account. More recently, we studied the case where several applications must be scheduled simultaneously on the same platform [43] .

We have applied steady-state techniques to collective communication schemes, such as scatters, broadcasts, parallel prefix and multicasts. We have derived polynomial algorithms for broadcasts and scatters, both under one port bidirectional and unidirectional [3] models.

From the computational complexity point of view, considering steady state and throughput maximization instead of makespan minimization is both realistic and efficient in the case of large scale heterogeneous platforms. Nevertheless, as already noted, besides their heterogeneity, large scale distributed platforms exhibit some level of dynamicity. In the case of grid-like platforms, we can assume that the topology does not change during the execution of an application, but the performances of communication and processing resources may be affected by external load. In the case of peer to peer platforms, the topology itself may change during the execution.

These characteristics must change dramatically the algorithms used for scheduling both applications and communications on those platforms. In particular, it is not realistic to assume that the topology and the actual performances of all resources are centralized at a given point. This requires the design of decentralized algorithms for achieving good throughput, where nodes make their decision according to their current state and the states of their immediate neighbors. We already considered this constraint in [42] and our aim is to generalize this framework to all scheduling problems we already considered.

In order to achieve this goal, we recently concentrate on the solutions proposed by P2P community. Indeed, the tremendous success of peer-to-peer (P2P) applications for file sharing led to the design of a large number of dedicated protocols, that run in a fully distributed environment. These protocols support local decisions, and the P2P services (publication, search, node insertion, etc.) are supported by a (virtual) overlay network connecting the peers over the Internet. Up to some extent, the current P2P protocols are stable and fault-tolerant, as witnessed by their wide and intensive usage. Nevertheless, the P2P protocols have been initially designed for file sharing applications and also studied in the context of general purpose distributed applications. Yet, such protocols have not been optimized for scientific applications, neither they are adapted to sophisticated data-base applications. In particular, the type of request they accept is too limited to consider general purpose applications (such as independent tasks sharing files applications, that appear for instance in Monte Carlo simulations). In [65] , we consider the extension of those protocols to range queries (instead of exact searches). Moreover, most of the protocols do not take resource performances (especially bandwidths) into account. Recently, Miroslaw Korzeniowski has been hired as INRIA post-doctorant and works on broadcast protocols that take network performances into account. This evolution is also the heart of the project proposal Cepage ( ) that should be presented to INRIA Futurs Project Committee in 2007.


Logo Inria