Section: New Results
Compilation and Synthesis for Reconfigurable Platform
Participants : Steven Derrien, Emmanuel Casseau, Daniel Ménard, François Charot, Christophe Wolinski, Olivier Sentieys, Patrice Quinton.
DURASE: Generic Environment for Design and Utilization of Reconfigurable Application-Specific Processors Extensions
Participants : Christophe Wolinski, François Charot, Erwan Raffin, Kevin Martin, Antoine Floch.
This year we focused on the architecture model of an ASIP processor with extended instruction sets. Extended instructions implement identified and selected computational patterns and can be executed sequentially or in parallel with the ASIP core processor instructions. This provides ways to trade execution time against hardware cost.
Our generic simplified architecture is depicted in Figure 4 . It is composed of one functionally reconfigurable cell implementing a set of computational patterns (selected by the DURASE system  ,  ,  ,  ) directly connected to the processor data-path. The selected patterns are merged by our merging procedure  before synthesis. The cell also contains registers for the case where the generated patterns have more than two inputs and one output (case of the NIOS II). The number of registers and the structure of interconnections are application dependent.
The DURASE system enables automatic synthesis of application specific processor extensions that speed-up application's execution. The system also carries out corresponding source code transformations to match the newly synthesized extensions. Finally, the synthesized extensions are tightly connected to a target processor and used through newly created instructions (see Figure 4 for example of the NIOS II processor and its extension). The design flow adopted in the DURASE system is presented in Figure 5 . The input to the DURASE system is an application code written in C, a target processor instruction set and an architecture model. The output is a processor extension and application specific instructions for accessing this extension. The processor extension is built using a merged pattern implementing all the selected computational patterns. Our system also generates the transformed application source code, including application specific instructions.
Our design process involves identification of computational patterns and selection of specific patterns that speed up application execution. The pattern identification and selection are executed in two consecutive steps. In the first step, we explore typical computational patterns and identify the most useful ones for a given application. Our method identifies all computational patterns directly from an application graph satisfying all architectural and technological constraints imposed by target processors and FPGA devices. The considered constraints include a number of inputs and outputs, a number of operators, and a delay of the pattern critical path. Therefore the identified patterns can be well tailored to target processors. The identified computational patterns are then used in the mapping and scheduling step where a subset of patterns is selected for implementation.
The developed DURASE system uses advanced technologies, such as algorithms for graph matching and graph merging  together with constraints programming methods.
Run-time reconfigurable architecture modeling
Participants : Christophe Wolinski, François Charot, Emmanuel Casseau, Daniel Ménard, Antoine Floch, Erwan Raffin, Steven Derrien.
We have continued to work on the modeling problem of the run-time partially reconfigurable architecture in order to optimize the execution time of the application. The architecture has been defined in the ROMA project. The architecture is parametric, and is composed of memories, a restricted number of communication switches and run time reconfigurable cells at the functional level. This year a new compilation flow has been defined including a meta-model of a generic architecture. The current design flow supports accumulative operators and assures the data flow application's execution. The loop kernel can be mapped on the architecture in such a way that the execution time is minimized.
In the context of the RecMotifs project, we have proposed a specific design flow integrating STMicroelectronics' compiler and our development platform enabling, in the future, the generation of application specific extensions to STMicroelectronics' processors and the compilation of applications on these new architectures. In the first step, a meta-model of the CDFG (Control Data Flow Graph) of ST's compiler was defined. Using model-to-model transformations the resulting graphs obtained by the compiler are transformed into HCDG graphs recognized by our environment. We have used the Kermeta(http://www.kermeta.org ) tools for this purpose. Next, we have started to work on the architecture model of the ST processor and its extensions. We have also initialized the work on a new CP (Constraint Programming) model of the scheduler well adapted to the parallel architecture of the entire system composed of the processor, the multi-extensions and the external memory. This model will be used in the future for efficient application compilation.
Architecture-Driven Synthesis of Reconfigurable Cells
Participants : Christophe Wolinski, François Charot, Erwan Raffin.
In the context of the DURASE system we have also focused on merging computational patterns to form a corresponding optimized reconfigurable cell. Existing methods cannot control critical paths and placement of multiplexers during merging. This leads to generation of area optimized architectures that often do not satisfy timing constraints. Timing constraints are, however, very important when the clock frequency of an ASIP processor needs to be optimized. Our original approach  , based on constraint programming, opens a new perspective and enables area optimization of a cell while respecting design constraints. For instance, area minimization of a merged cell without increasing its critical path is possible in our approach. Experiments carried out on MediaBench test suite indicate 50% average reduction of cell area without increasing critical path.
Hierarchical Methodology for Floating-Point to Fixed-Point Conversion
Participants : Daniel Ménard, Karthick Parashar, Olivier Sentieys, Romuald Rocher.
The problem of converting floating-point algorithms to implementation-friendly fixed-point formats is often solved as an optimization problem where the precision is traded to gain in the implementation cost. The complexity of the problem is known to grow exponentially with more variables for the optimization process. In  we propose a divide and conquer technique to solve the growing size of the problem. A hierarchical approach has been proposed to perform wordlength optimization of a complete system made-up of several subsystems. At the system level, the fixed-point behavior of each subsystem is modeled by a single noise source located at the subsystem output. The aim is to find the noise power levels of each noise source so as to minimize the implementation cost while maintaining the overall performance. The application performance is evaluated through an original approach mixing simulation and analytical approaches detailed in Section 22.214.171.124 . The analytical technique accelerates the simulation of some parts of the system. At the subsystem level, analytical models are used for evaluating the implementation cost and the computation accuracy. Compared to existing approaches, our method allows reducing the optimization time and supporting complex systems by combining the advantages of the simulation and analytical approaches.