## Section: Scientific Foundations

### Scheduling Strategies and Algorithm Design for Heterogeneous Platforms

Participants : Guillaume Aupy, Anne Benoit, Marin Bougeret, Alexandru Dobrila, Fanny Dufossé, Amina Guermouche, Mathias Jacquelin, Loris Marchal, Jean-Marc Nicod, Laurent Philippe, Paul Renaud-Goud, Clément Rezvoy, Yves Robert, Mark Stillwell, Bora Uçar, Frédéric Vivien, Dounia Zaidouni.

Scheduling sets of computational tasks on distributed platforms is a key issue but a difficult problem. Although a large number of scheduling techniques and heuristics have been presented in the literature, most of them target only homogeneous resources. However, future computing systems, such as the computational Grid, are most likely to be widely distributed and strongly heterogeneous. Therefore, we consider the impact of heterogeneity on the design and analysis of scheduling techniques: how to enhance these techniques to efficiently address heterogeneous distributed platforms?

The traditional objective of scheduling algorithms is the following: given a task graph and a set of computing resources, or processors, map the tasks onto the processors, and order the execution of the tasks so that: (i) the task precedence constraints are satisfied; (ii) the resource constraints are satisfied; and (iii) a minimum schedule length is achieved. Task graph scheduling is usually studied using the so-called macro-dataflow model, which is widely used in the scheduling literature: see the survey papers [81] , [90] , [93] , [94] and the references therein. This model was introduced for homogeneous processors, and has been (straightforwardly) extended to heterogeneous computing resources. In a word, there is a limited number of computing resources, or processors, to execute the tasks. Communication delays are taken into account as follows: let task $T$ be a predecessor of task ${T}^{\text{'}}$ in the task graph; if both tasks are assigned to the same processor, no communication overhead is incurred, the execution of ${T}^{\text{'}}$ can start immediately at the end of the execution of $T$; on the contrary, if $T$ and ${T}^{\text{'}}$ are assigned to two different processors ${P}_{i}$ and ${P}_{j}$, a communication delay is incurred. More precisely, if ${P}_{i}$ completes the execution of $T$ at time-step $t$, then ${P}_{j}$ cannot start the execution of ${T}^{\text{'}}$ before time-step $t+\text{comm}\left(T,{T}^{\text{'}},{P}_{i},{P}_{j}\right)$, where $\text{comm}\left(T,{T}^{\text{'}},{P}_{i},{P}_{j}\right)$ is the communication delay, which depends upon both tasks $T$ and ${T}^{\text{'}}$, and both processors ${P}_{i}$ and ${P}_{j}$. Because memory accesses are typically several orders of magnitude cheaper than inter-processor communications, it is sensible to neglect them when $T$ and ${T}^{\text{'}}$ are assigned to the same processor.

The major flaw of the macro-dataflow model is that communication resources are not limited in this model. Firstly, a processor can send (or receive) any number of messages in parallel, hence an unlimited number of communication ports is assumed (this explains the name macro-dataflow for the model). Secondly, the number of messages that can simultaneously circulate between processors is not bounded, hence an unlimited number of communications can simultaneously occur on a given link. In other words, the communication network is assumed to be contention-free, which of course is not realistic as soon as the number of processors exceeds a few units.

The general scheduling problem is far more complex than the traditional objective in the macro-dataflow model. Indeed, the nature of the scheduling problem depends on the type of tasks to be scheduled, on the platform architecture, and on the aim of the scheduling policy. The tasks may be independent (e.g., they represent jobs submitted by different users to a same system, or they represent occurrences of the same program run on independent inputs), or the tasks may be dependent (e.g., they represent the different phases of a same processing and they form a task graph). The platform may or may not have a hierarchical architecture (clusters of clusters vs. a single cluster), it may or may not be dedicated. Resources may be added to or may disappear from the platform at any time, or the platform may have a stable composition. The processing units may have the same characteristics (e.g., computational power, amount of memory, multi-port or only single-port communications support, etc.) or not. The communication links may have the same characteristics (e.g., bandwidths, latency, routing policy, etc.) or not. The aim of the scheduling policy can be to minimize the overall execution time (makespan minimization), the throughput of processed tasks, etc. Finally, the set of all tasks to be scheduled may be known from the beginning, or new tasks may arrive all along the execution of the system (on-line scheduling).

In the Graal project, we investigate scheduling problems that are of practical interest in the context of large-scale distributed platforms. We assess the impact of the heterogeneity and volatility of the resources onto the scheduling strategies.