Section: New Results
Formal verification of compilers
The Compcert verified compiler for the C language
Participants : Xavier Leroy, Sandrine Blazy [EPI Celtique] , Alexandre Pilkiewicz.
In the context of our work on compiler verification (see section 3.3.1 ), since 2005 we have been developing and formally verifying a moderately-optimizing compiler for a large subset of the C programming language, generating assembly code for the PowerPC, ARM, and x86 architectures [4] . This compiler comprises a back-end part, translating the Cminor intermediate language to assembly code, reusable for source languages other than C [3] , and a front-end translating the CompCert C subset of C to Cminor. The compiler is mostly written within the specification language of the Coq proof assistant, from which Coq's extraction facility generates executable Caml code. The compiler comes with a 50000-line, machine-checked Coq proof of semantic preservation establishing that the generated assembly code executes exactly as prescribed by the semantics of the source C program.
This year, we improved the Compcert C compiler in several ways:
-
The input language to the proved part of the compiler was extended to support side-effects within expressions. The formal semantics for this language is non-deterministic, as it accounts for the partially unspecified evaluation order of C expressions. The transformations that pull side effects out of expressions and materialize implicit casts, formerly performed by untrusted Caml code, are now fully proved in Coq.
-
A port targeting Intel/AMD x86 processors was added to the two existing ports for PowerPC and ARM. The new port generates 32-bit x86 code using SSE2 extensions for floating-point arithmetic. CompCert's compilation strategy is not a very good match for the x86 architecture, therefore the performance of the generated code is not as good as for the PowerPC port, but still usable. (About 75% of the performance of gcc -O1 for x86, compared with more than 90% for PowerPC.)
-
The operational semantics for accesses to volatile -qualified variables was revised to capture more precisely their intended behavior. In particular, volatile reads and writes from/to a volatile global variable are treated like input and output system calls, respectively, bypassing the memory model entirely.
-
The performance of the generated code was improved, in particular via a better treatment of spilled temporaries during register allocation.
-
Compilation times were reduced thanks to several algorithmic improvements in the optimization passes based on dataflow analysis.
Three versions of the CompCert development were publically released: versions 1.7 in March, 1.7.1 in April, and 1.8 in September.
Several of these improvements were prompted by the result of an experimental study of CompCert's usability conducted at Airbus by Ricardo Bedin França under the supervision of Denis Favre-Felix, Marc Pantel and Jean Souyris. Preliminary results are reported in an article to be presented at the 2011 workshop on Predictability and Performance in Embedded Systems [15] .
Verified compilation of C++
Participants : Tahina Ramananandro, Gabriel Dos Reis [Texas A&M University] , Xavier Leroy.
Object layout and management, including dynamic allocation, field resolution, method dispatch and type casts, is a critical part of the compilation and runtime systems of object-oriented languages such as Java or C++. Formal verification of this part needs relating an abstract formalization of object operations at the level of the source language semantics with a concrete representation of objects in the memory model provided by the target low-level language. As this work heavily uses pointer arithmetic, the proofs must be treated with specific methods.
This year, under Xavier Leroy's supervision and with precious C++ advice from Gabriel Dos Reis, Tahina Ramananandro tackled the issue of formally verifying object layout and management in multiple-inheritance languages, especially the C++ flavour featuring non-virtual and virtual inheritance (allowing repeated and shared base class subobjects), and also structure array fields. This is a step towards building a formally verified compiler from an object-oriented subset of C++ to RTL (a CFG-style intermediate language of the CompCert back-end). This formalization consists in proving, in Coq, the correctness of a family of object layout algorithms (including one popular algorithm inspired from the Common Vendor ABI for Itanium, which has since been reused and adapted by GNU GCC) with respect to the formal operational semantics of an object-oriented subset of C++ featuring static and dynamic casts, and field accesses. These results were accepted for publication at the forthcoming POPL 2011 symposium [24] .
Then, Tahina Ramananandro has been reusing and extending this work to formalize object construction and destruction, especially their impact on the behaviour of dynamic operations such as virtual function dispatch, and the consequences of such changes on a realistic compiler implementation based on virtual tables (again inspired from the aforementioned Common Vendor ABI for Itanium).
Optimizations in the CompCert C compiler
Participant : Alexandre Pilkiewicz.
Alexandre Pilkiewicz made some experiments on optimizations implementation in the CompCert C compiler. The first was a fully proved implementation of a mixed data-flow analyzer/code transformer in the style of Lerner, Grove and Chamber [43] and Hoopl [51] . Even if this allowed a combined implementation of constant propagation and common sub-expressions elimination, it was too limited to be of real interest since it only allows to replace one instruction by exactly one other instruction, and not to work at the level of basic blocks.
Another experiment was an implementation of the global value numbering algorithm by Gulwani and Necula [40] , but the proof is not finished yet.
Finally, Alexandre Pilkiewicz implemented a function inlining pass over the Cminor intermediate language of CompCert. Its correctness proof is a work in progress.
Validation of polyhedral optimizations
Participants : Alexandre Pilkiewicz, François Pottier.
Numerical codes make heavy use of nested loops. Those nests can be optimized for data locality (reducing the number of cache miss) or automatic parallelization. One of the ways to do that is to represent the loop nest as a multidimensional polyhedron, where the program is just a particular scheduling of a walk over the polyhedron. The optimization process can then be summarized as finding a better schedule that respect the initial constraints (for example, one cannot interchange the assignments t[0] = 1; and t[0] = 42; ).
Finding such a valid schedule relies on subtle heuristics and optimized C libraries for polyhedron manipulation. They are, therefore, prone to error. Alexandre Pilkiewicz, under François Pottier's supervision, works on developing and proving in Coq a validator of such optimizations. The idea is to check a posteriori and at each run of the optimizer that the produced program is equivalent to the original one. This work in progress is done in collaboration with Nicolas Magaud, Julien Narboux and Eric Violard of the INRIA Camus team.