Team Alchemy

Overall Objectives
Scientific Foundations
New Results
Contracts and Grants with Industry
Other Grants and Activities

Section: New Results

Program optimizations

Practical Approach

Participants : Grigori Fursin, Albert Cohen, Cédric Bastoul, Louis-Noël Pouchet, Walid Benabderrahmane.

Here are the most recent key scientific achievements.

Collective Tuning Center

Participants : Grigori Fursin, Olivier Temam.

We created an open community-driven collaborative wiki-based portal that brings together academia, industry and end-users to develop intelligent collective tuning technology that automates and simplifies compiler, program and architecture design and optimization. This technology minimizes repetitive time consuming tasks and human intervention using collective optimization, run-time adaptation, statistical and machine learning techniques. It can already help end users and researchers to improve execution time, code size, power consumption, reliability and other important characteristics of the available computing systems automatically (ranging from supercomputers to embedded systems) and should eventually enable development of the emerging intelligent self-tuning adaptive computing systems. Collective Optimization Database is intended to improve the quality of academic research by avoiding costly duplicate experiments and providing reproducible results.

Transitive Closure of Union of Affine Relations

Participants : Denis Barthou, Anna Beletska, Albert Cohen, Konrad Trifunovic.

We studied a method to compute the transitivite closure of a union of affine relations on integer tuples. Within Presburger arithmetics, complete algorithms to compute the transitive closure exist for convex polyhedra only. In presence of non-convex relations, there exists little but special cases and incomplete heuristics. We introduce novel sufficient and necessary conditions defining a class of relations for which an exact computation is possible. These conditions can be relaxed to define larger classes where conservative approximations and/or more complex closed forms can be obtained. Our method is immediately applicable to a wide area of symbolic computation problems. It is illustrated on representative examples and compared with state of the art approaches.

Optimizing code through iterative specialization

Participants : Minhaj Khan, Henri-Pierre Charles, Denis Barthou.

Code specialization is a way to obtain significant improvement in the performance of an application. It works by exposing values of different parameters in source code. The availability of these specialized values enables the compilers to generate better optimized code. Although most of the efficient source code implementations contain specialized code to benefit from these optimizations, the real impact of specialization may however vary depending upon the value of the specializing parameter.

We have studied in  [116] an iterative approach for code specialization. From some specialized code, we search for a better version of code by re-specializing the code, followed by a low-level code analysis. The specialized versions fulfilling the required criteria are then transformed to generate another equivalent version of the original specialized code. The approach, tested on Itanium2 architecture using gcc/icc compilers show significant improvement in the performance of different benchmarks.

Simulation of the Lattice QCD and Technological Trends in Computation

Participants : Mouad Bahi, Denis Barthou, Cédric Bastoul, Walid Benabderrhamane, Christine Eisenbeis, Julien Jaeger, Louis-Noël Pouchet.

This is a joint ANR project “PetaQCD” with Lal (Orsay), Irisa Rennes (Caps/Alf), IRFU (CEA Saclay), LPT (Orsay), Caps Entreprise (Rennes), Kerlabs (Rennes), LPSC (Grenoble).

Simulation of the Lattice QCD is a challenging computational problem. Currently, technological trends in computation show multiple divergent models of computation. We are witnessing homogeneous multicore architectures, the use of accelerator on-chip or off-chip, in addition to the traditional architectural models.

On the verge of this technological abundance, assessing the performance tradeoffs of computing nodes based on these technologies is of crucial importance to many scientific computing applications.

In this study  [114] , we focus on assessing the efficiency and the performance expected for the Lattice QCD problem on representative architectures and we project the expected improvement on these architectures and their impact on performance for the Lattice QCD. We additionally try to pinpoint the limiting factors for performance on these architectures. This work takes place in ANR PARA and ANR QCDNEXT (both 2005-2008) and has led to the project ANR PetaQCD (2009-2011)[33] .

Loop Optimization using Adaptive Compilation and Kernel Decomposition

Participants : J. Jaeger, P. Oliveira, S. Louise, D. Barthou.

We study a new hierarchical compilation approach for the generation of high performance applications, relying on the use of state of the art compilers. This appproach is not application dependent and do not require any assembly hand-coding. It relies on the decomposition of the loop nests of the hotest functions in the application into simpler kernels, typically 1D to 2D loops, much simpler to optimize. We successfully applied this approach for dense linear algebra in 2005, reaching performance of constructor libraries. The advantage of the generated kernels is that their performance no longer depend on data input, but only on its location in memory hierarchy. Using a performance model for the memory hierarchy, it is possible to find out the best composition of kernels to use.

For larger applications, the code is no longer regular and data accesses are in particular irregular (use of indirections). Working with applications of project ANR PARA (MPEG4, QCD, oil simulation and BLAST), we study how to adapt the previous approach to these cases. When control is irregular (involving different execution path), we study the the WCET, in particular in the context of embedded applications for MPSOC architectures. This is the subject of an on-going collaboration with CEA/Lastre.

Dataflow Analysis for Irregular Programs and its applications

Participants : M. Belaoucha, S. Touati, D. Barthou.

Instance-wise dataflow analysis provides the exact execution of a statement defining a value that is read at some other point during a program execution. This analysis generates more precise information than traditional dependence analyses and can therefore validate more optimizing transformations. An implementation of this analysis, as a standalone library, has be performed by M. Belaoucha (and funded by contract Teraops and PARMA) and its integration in gcc/Graphite is in progress.


Logo Inria