PDF e-Pub

## Section: Application Domains

### Resource Allocation for High Performance and Cloud Computing

In the context of numerical simulations on high performance machines, optimizing data locality and resource usage is very important for faster execution times and lower energy consumption. This optimization can be seen as a special case of scheduling problem on parallel resource, with several challenges. First, instances are typically large: a large matrix factorization (with $50×50$ blocks) involves about $30·{10}^{3}$ tasks. Then, HPC platforms consist of heterogeneous and unrelated resources, what is known to make scheduling problems hard to approximate. Finally, due to co-scheduling effects and shared communication resources, it is not realistic to accurately model the exact duration of tasks. All these observations make it impossible to rely on static optimal solutions, and HPC applications have gone from simple generic static allocations to runtime dynamic scheduling strategies that make their decisions based on the current state of the platform (the location of input data), the expected transfer and running times for the tasks, and some affinity and priority information that have possibly been computed offline. In this context, we are strongly involved in the design of scheduling strategies for the StarPU runtime, with two goals: proving that it is possible to design approximation algorithms whose complexity is extremely small (typically sub-linear in the number of ready tasks), and show that they can be used in practice with good performance results. We are pursuing collaborations both with teams developing the StarPU system (Storm) by designing algorithms for the generic scheduling problems  [37], and with teams developing linear algebra algorithms over the runtime (Hiepacs), by proposing specialized algorithms for specific cases. For example, in the case of linear algebra applications on heterogeneous platforms, we have considered the combinatorial optimization problem associated to matrix multiplication, that is amenable to partitioning the unit square into zones of prescribed areas while minimizing the overall size of the boundaries. We have improved the best known approximation ratio to 1.15 in  [36] and we have shown that the resulting distribution schemes can indeed be used to design efficient implementations using StarPU in  [42].