Overall Objectives
Scientific Foundations
Application Domains
New Results
Other Grants and Activities

Section: New Results

Binary parallelization

Our basic decompilation mechanism lets us extract loop nests from binary code. In favorable cases, the loops are fully described, and every memory access they contain has an associated linear combination of base register values and loop counters. Such a representation is enough to apply parallelization techniques based on the polytope model. These techniques are currently the best known way to derive an efficient parallel equivalent of the original loop nest. Our goal in this project is to exploit this similarity of representations, to build a parallelizing binary-to-binary compiler. Given that we are able to locate and describe loop nests inside binary programs, the basic workflow of our compiler is to first extract a suitable representation of the complete loop nest, then, in a second phase, apply parallelizing transformations to the model, and then, in the third and last phase, regenerate binary code for the parallel version of the loop nest. The rest of the program remains unchanged.

The first phase, called the raising phase, relies on our decompilation techniques. Given a binary program, the loops are located and brought (or “raised”) to a data structure containing linear memory access functions and loop bounds. All functions may reference loop invariant registers, whose definitions are also precisely located, thanks to the SSA form. The first step is to remove unnecessary instructions: since all memory accesses are expressed as functions, the parts of the program that participate in actually computing these functions becomes useless. We have employed a basic slicing technique to select the instructions that need to be preserved. Starting from instructions that write either to memory or a register that is live on exit of the loop, our algorithm follows all possible paths through definitions that are required for the written values to be computed. Redundant instructions (i.e., those that have not been visited during the slicing) are removed. The remaining instructions are all necessary for the transformed program to have the same effect as the original program. These instructions, as well as the loop nest structure, are translated into C-like code. Various other transformations are also applied in order to “clean up” the extracted program and make it usable by the next phase.

The second phase, called the parallelization phase, takes as input our C-like loop nest filled with affine accesses to a unique array (the memory). It then performs a complete parallelization analysis, including dependence analysis and scheduling. We have left this part to an external tool, not developed by our team [25] . This tool is a state-of-the-art parallelizer, and provides the basic infrastructure. However, its use is, from our point of view, a temporary solution: it will be complemented by various other tools in the future. The result of this phase is a new program, augmented with compiler directives (typically, OpenMP directives).

The third and last phase, called the lowering phase, consists in reproducing executable code. As we have seen, most of the work happens during the first phase (because parallelization is performed by a “real” compiler). However, conceptually, it is relevant to consider it a distinct phase. It mostly consists in ensuring that all groups of instructions that have an effect are actually present in the result. It mostly involves translating low-level instructions extracted from the binary code into C code that are guaranteed to be propagated almost directly into the final program. It also involves creating a new executable including the modified code and ensure the correct integration inside the code that is copied verbatim. This phase is mostly technical, even though it has led us to develop several techniques that may be reused in other contexts.

This project is especially interesting because it provides a new view on parallelization (very few attempts have been made at general-purpose, binary-to-binary compilers). Its main contribution is to make parallelization a service of the execution environment, instead of a feature of the compiler. Any user of the operating system can benefit from our parallelizer, whatever compiler they used to produce the original sequential programs, and whatever library they used, even for programs that use components written in different languages and/or compiled with different compilers. Deferring the parallelization to the execution environment opens up several new research directions, which we plan to explore as soon as possible. It also allows the parallelization to take into account the target architecture.


Logo Inria