Overall Objectives
Scientific Foundations
Application Domains
New Results
Other Grants and Activities

Section: New Results

Dynamic analysis of the memory behavior of executable programs

Program skeletonization consists in transforming a given program into another program whose only task is to produce some data that one wants to observe about the original program. We apply this technique to memory tracing. If one wants to obtain the complete list of memory address that the given program accesses, our algorithm builds a new program that outputs the list of memory addresses. This new program, called the skeleton, is not equivalent to the original program: it is restricted to what the user is interested in (in our case, memory addresses). The rest of the program, e.g., the computation of results, is simply ignored and does not appear in the skeleton. The main motivation for skeletonization is practical: instrumenting each and every memory access in a given program has two main difficulties. First the original program increases in size, which makes it slower. Second, since memory accesses are extremely frequent (on every third instruction execution on average), the instrumentation usually causes massive slowdowns.

Because we want to reproduce the list of memory addresses for a given execution, the skeleton's execution needs to somehow depend on the input data. Program skeletonization is designed to clearly separate both aspects: the skeleton is directly derived from the original program only, but it needs an input trace to reproduce a given execution. However, this input trace may not be composed exclusively of relevant input data. It may also contain intermediate computations that have been found too complex for the skeleton to reproduce, and these intermediate computations may or may not use input data. The only guarantees provided are: 1) the skeleton is completely independent of input data, and 2) given its input trace, the skeleton will faithfully reproduce the stream of memory access addresses.

Building the skeleton is an immediate application of our decompilation process (described earlier). Starting with a binary program, the decompiler provides 1) a linear combination of loop counters and base registers for each memory access, 2) a simple comparison of a linear form for all branch conditions that can be parsed, and 3) a set of base register definitions, that are used in the various expressions. Building the skeleton is immediate: our algorithm generates one basic block per basic block in the original program, where it places input statements where a base register is defined, output statements where an address is computed, and branches if the basic block ends with a conditional branch (the condition governing this branch is either computed or input). Loop counters are also defined and incremented in the skeleton, but this does not require any input data. Our system actually generates a C program, giving the C compiler an additional opportunity to optimize the skeleton code.

Producing the skeleton's input trace is the charge of the original program, which must be instrumented to output the values of the base registers. This is a regular instrumentation, obtaining register values, as well as unknown branch outcomes. Every obtained value is written out, either to a pipe at the end of which the skeleton is currently running, or to a file if the trace has to be saved to be reused later (possibly multiple times).

The overall strategy is interesting in several respects. First, the skeleton is independent of the input data, which means it is reusable across executions. Second, the input trace produced by instrumenting the original program is usually much smaller than the full memory trace, which makes the original program run with less overhead. This is true statically (there are less instrumentation points than for memory instrumentation) and dynamically (since many loops are completely based on loop invariant registers, a large part of the execution doesn't use instrumentation code at all). Finally, running the skeleton is completely independent of the original execution context, which is not needed anymore since every aspect of it has been captured in the trace.

A paper describing our approach and system has been accepted for presentation at the 2011 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS-2011) [14] , to be held in April.


Logo Inria