## Section: New Results

### Register Allocation and ssa Form Properties

Participants : Florent Bouchez, Alain Darte, Christophe Guillon [ stm icroelectronics ] , Fabrice Rastello.

The work presented in this section is a joint work with members of the cec team at stm icroelectronics. It is the natural extension of the work described in Section 6.2 . It deals with the following problems related with register allocation: spilling, coalescing, and splitting. Register allocation consists in allocating abstract variables to physical registers, spilling chooses which variables to store in memory when there are too few registers to store all of them, coalescing is used to regroup two variables into one when possible, hence reducing the number of variables, and splitting does the converse. Understanding the interactions between these three techniques is the heart of our work.

**Register allocation** Register allocation is one of the most studied problems in compilation.
It is considered as an NP-complete problem since Chaitin et al., in
1981, modeled the problem of assigning temporary variables to k
machine registers as the problem of coloring, with k colors, the
interference graph associated to the variables. The fact that the
interference graph can be arbitrary proves the NP-completeness of this
formulation. However, this original proof does not really show where
the complexity of register allocation comes from. Recently, the
re-discovery that interference graphs of SSA programs can be colored
in polynomial time raised the question: Can we use SSA to
do register allocation in polynomial time, without contradicting
Chaitin et al's NP-completeness result?
To address such a question and, more
generally, the complexity of register allocation, we have revisited
Chaitin et al's proof to identify the interactions between spilling,
coalescing/splitting, critical edges, and coloring.
Our study shows that the NP-completeness of register allocation is
*not* due to the coloring phase, as may suggest a
misinterpretation of the reduction of Chaitin et al. from graph
k-coloring. If live-range splitting is taken into account, deciding
if k registers are enough or if some spilling is necessary is not as
hard as one might think. The NP-completeness of register allocation is
due to three factors: the presence of critical edges or not, the
optimization of spilling costs (if k registers are not enough) and
of coalescing costs, i.e., which live-ranges should be fused while
keeping the graph k-colorable. These results have been presented at
WDDD [7] and LCPC [8] .

**Critical edges** A critical edge is an edge of the control flow
graph whose predecessor block has more than one successor and whose
successor block has more than one predecessor. Forbidding live-range
splitting on critical edges makes the coloring problem NP-complete.
Similarly the presence of critical edges makes the translation out of
SSA form difficult (see Section
6.2 ) and most existing
approaches are not able to handle code with critical edges. However,
some real codes (supported by some compilers) do contain critical
edges that the compiler is not able to split (abnormal edges). Also,
splitting an edge has a cost, and it might be preferable not to split
an edge of high execution frequency. We have introduced a new
technique called parallel-copy motion. Basically, the copies related
to a split can be moved backward or forward the edge; a parallel-copy
moved backward on the predecessor block should be compensated by a
reverse parallel copy on the other outgoing edges of this block; the
process has to be iterated and of course could loop and lead to an
inconsistency. Determining whether this inconsistency is avoidable or
not is precisely where the hardness of coloring comes from. Our
experiments have shown that, in practice, for the benchmark suites
available from our collaboration with stm icroelectronics, this problem can be
solved in polynomial time and critical edges can be efficiently
handled this way. We are currently writing a research report
presenting these results.

**Spilling** Minimizing the amount of spilled variables is a
highly-studied problem in compiler design; it is an NP-complete
problem in general. Besides, an important consequence of our study
(see [8] and the previous paragraph on register
allocation) is that the goal of the spilling phase simply relies on
lowering the register pressure (number of variables alive at this
program point) at every program point, and the optimization problem
corresponds to minimizing the spilling cost. The question raised by
this important remark was: is it easier to solve the spilling problem
under SSA? This would lead to register allocation heuristics by
transforming a program to SSA, spilling the variables, allocating them
with the exact number of registers and then trying to go out of SSA.
The spilling can be considered at different granularity levels: the
highest, called ``spill everywhere'', corresponds to consider the live
range of each variable entirely: a spilled variable will result in a
store after each definition point and a load before each use point.
The finer granularity, called ``load-store optimization'', corresponds
to optimize each load and store separately. This accurate formulation,
also known as ``paging with write back'', is
NP-complete [36] even for a basic block in SSA
form. However, the coarser view (spill everywhere) is much simpler and
can be solved in polynomial time [27] for a SSA basic
block. What about more general graph structures? We have studied in
details the spill everywhere formulation under SSA form. We have
considered different instances of the problem depending on the machine
model and also depending on the relative values of register pressure
and number of registers. We found that most instances are NP-complete.
The complexity almost always comes from the number of registers and
the number of arguments by instruction (for RISC machines) but not
from the register pressure. We are currently writing a report on
these results.

**Coalescing** To finish with, the last question is related to the
coalescing problem: how can we minimize or at least reduce the number
of generated copies? We have distinguished several optimizations that
occur in most coalescing heuristics: a) aggressive coalescing removes
as many moves as possible, regardless of the colorability of the
resulting interference graph; b) conservative coalescing removes as
many moves as possible while keeping the colorability of the graph; c)
incremental conservative coalescing removes one particular move while
keeping the colorability of the graph; d) optimistic coalescing
coalesces moves aggressively, then gives up about as few moves as
possible so that the graph becomes colorable. We have almost
completely classified the NP-completeness of these problems,
discussing also on the structure of the interference graph: arbitrary,
chordal, or k-colorable in a greedy fashion. These results will be
presented at CGO'07 [9] . AS for experiments, we already
have promising results with heuristics that outperform previously
proposed techniques.

These developments are part of two contracts (see Section 7.1 and Section 7.2 ) with the cec team at stm icroelectronics and form the theoretical foundations of our implementations in the stm icroelectronics compiler.