Section: New Results
Keywords : architecture synthesis, reconfigurable system, communication, scheduler, synthesis, system on-chip, flexible compilation, architecture modeling ASIP design, fixed-point arithmetic.
Modeling, synthesis and compilation for reconfigurable platforms
Synthesis and compilation techniques
Derivation of efficient architectures for regular arrays
Participants : Steven Derrien, François Charot, Alain Darte [CompSys INRIA Rhône-Alpes], Tanguy Risset [CompSys INRIA Rhône-Alpes], Anne-Marie Chana, Patrice Quinton, Christophe Wolinski.
Our research aims at developing methods and tools to synthesize parallel architectures for data-intensive applications expressed using the Alpha applicative language. These methods are implemented in the mma lpha software.
The Alpha language allows systems to be modelled using structured descriptions: some components can be separately represented, and later instantiated as an elementary block in a larger application. In many applications, these blocks have different clock rates, and it is the case for example, in the WCDMA (Wireless Code Division Multiple Access) air interface. We have been able to represent in Alpha multi-rate systems, by adding special components that model up- and down-samplers, and we have extended the structured scheduler of mma lpha in order to find out the rates of all elementary blocks as well as the detailed schedule of each block. This activity, which was started during the thesis of Madeleine Nyamsi in 2005, is being pursued in the frame of a research co-operation with the Laval University in Québec city and UQTR (Québec) for the modelling of MIMO communication schemes.
Another research avenue is to relate the polyhedral model – the theoretical model of Mathematica – to other important models for architecture specification and synthesis. This work is done in cooperation with the Espresso project-team of IRISA. The context is the design of integrated circuits for multimedia applications using jointly the data-flow model, the polyhedral model, and high-level synthesis methods. The objective is to take benefit of these models in order to optimize systems containing both control aspects and intensive computations. The study relies on three modeling platforms : Polychrony with Signal, MMAlpha with Alpha and Gaut for high-level synthesis.
Generating automatically efficient interfaces between the kind of accelerators that regular array synthesis produced and the rest of the SoC where this element is introduced is often the most tedious and error-prone part of a design and it has often a strong influence on the actual performance benefits provided by the hardware acceleration. To solve this problem, we proposed to formulate it as a classical resource-constrained program, and thanks to recent optimization techniques [38] , we were able to define conditions for obtaining a conflict-free schedule of input/output for multi-dimensional processor arrays (e.g., 2D grids) [37] . Since the schedule is static, it allows us to perform further optimizations such as grouping successive data in packets to operate in burst mode. A comparative approach (targeting FPGA technology) between our static schedule and a run-time congestion resolution has shown important gains in hardware area, while preserving the design clock period. We are currently working on an extension of our hardware interface model that would take advantage of this static I/O schedule to allow data prefetching and buffered write techniques, combined with a custom scratch-pad memory.
Automatic synthesis of optimized reconfigurable systems
Participant : Christophe Wolinski.
This year we have continued investigating the problem of optimized Fabric synthesis(This work is an extension of research on an automatic optimized reconfigurable system synthesis undertaken at Los Alamos National Laboratory, USA.).
In this context we have studied how to bind the application data flow graph to the run-time reconfigurable heterogeneous Fabric cells in order to increase the performance of the entire system.
The binding and scheduling problems were defined and solved using a constraints-programming approach. This approach made it possible to obtain optimal solutions in terms of execution time while the number of run-time reconfigurations is minimized. As a result of our research, an automatic tool was developed.
Specialized microcontroller synthesis on FPGA
Participant : Ludovic L'Hours.
This research aims at developing techniques to synthesize specialized microcontrollers on FPGA. The targeted applications are described in a high-level language such as C, where control strongly prevails (peripheral control driver, packet processing, etc), such as the operating system embedded into the RDisk machine [46] . The main goal is to get small-sized circuits, with reasonable performances. The traditional approaches of architecture synthesis generally aim at maximizing the performance by analyzing and paralleling the data flow, there are not suited to applications with complex control flow. Their intrinsic sequential feature naturally leads to use software compilation techniques.
We designed a microcontroller synthesis technique based on the extraction of a specialized instruction set from a given application. The design of this instruction set is mainly leaded by application profiling information, but also by different estimators such as the pattern complexity or the number of bus access. The microcontroller is then derived from this instruction set using VHDL templates: the targeted architecture is currently a RISC processor, but other kind of architectures such as VLIW could be considered. Compared to fixed instructions set microprocessors, we managed to reduce by a magnitude of 2, both the code size and the processor size, for the same range of performances [53] .
All these algorithms where integrated in a generic compilation platform called Gecos. This platform in constant development is freely available ( http://gecos.gforge.inria.fr ). Investigation are currently undertaken to use this methodology to generate processors for very constrained devices, such as nodes of a sensor network.
Architecture description language
Participants : François Charot, Julien Lallet, Sébastien Pillement, Olivier Sentieys.
Our research aims at developing methods to model programmable processors through their instruction sets and tools to derive software development environments from these processor models. A processor description in ARMOR is a grammar whose each derivation is a possible behavior of the instruction set. ARMOR thus describes the behavior of the instruction set, including its semantics, temporal information, the use of the resources, as well as the possibilities of parallelism at the instruction level.
The future objective is to extend this work for the modelling of reconfigurable and specialized SoC architectures with the goal to exploit such models in retargetable compilation flows adapted to reconfigurable architectures. A new study concerning the definition of a platform-based reconfigurable architecture has started. The main goal is to define a generic architecture (based on the DART paradigm) supporting different applications domains. The study has begun with the definition of a very flexible network. This network is able to connect and to enable communication between every kind of dynamically reconfigurable heterogeneous resources. The definition of the architecture is done with the help of a high-level architecture description language based on the MAML language developed at the University of Erlangen-Nuremberg. We developed a tool able to analyse the description and to produce an adequately synthesisable VHDL model. This generated network is associated with a flexible reconfiguration process which is not depending on the type and the quantity of the hardware resources. Due to local memories and a separated configuration path, the reconfiguration process is executed in one clock signal. Currently, some experiences are done concerning the implementation of preemptive resources.
Floating-point to fixed point transformation
Floating-point to fixed-point conversion methodology for FPGA
Participants : Daniel Menard, Nicolas Hervé, Daniel Chillet, Romuald Rocher, Olivier Sentieys.
A new methodology to implement floating-point applications into an FPGA using fixed-point arithmetic is proposed. The user has to specify the application time and accuracy constraint (expressed as the minimum output Signal to Quantization Noise Ratio). Then the methodology converts the application into fixed-point. Our approach aim is to determine the fixed-point specification which minimizes the architecture cost and leads to a sufficient computation accuracy expressed through the accuracy constraint.
The fixed-point conversion process must determine, for all data, a word-length and a binary-point position. It is composed of three main tasks. The first step corresponds to the data dynamic range evaluation. These results are used in the second step to determine the binary point locations. The third step objective is to fix the data word-length, such that the architecture cost is minimized and the accuracy constraint is satisfied. The accuracy is evaluated with an analytical method to reduce dramatically the optimization time compared to simulation based methods. To generate an optimized architecture, the operator word-length optimization and the synthesis process are coupled [17] . Thus, an iterative process on high-level synthesis and operator word-length optimization is used to improve both of these dependent processes. This coupling allows reducing the architecture operator number. Indeed, smaller word-length operators have a reduced latency. Compared to classical implementations based on a uniform word-length, our approach reduces architecture cost from 20 % to 40 % [18] , [33] .
Fixed-point accuracy evaluation
Participants : Daniel Menard, Romuald Rocher, Pascal Scalart, Olivier Sentieys.
An important part of the floating-point to fixed-point process is the fixed-point accuracy evaluation. The accuracy is evaluated through the Signal to Quantization Noise Ratio (SQNR). A general method based on an analytical approach has been proposed. This method is valid for all quantization laws (truncation and rounding) and for all systems including arithmetic operations. The proposed technique is based on a matrix model which simplifies the expression for transform algorithms such as FFT or DCT. For recursive systems, the method unrolls the recurrence. The complexity of our approach has been determined. To reduce this complexity, a linear prediction model has been developed. This model accelerates recurrence unrolling by approximating the recurrence terms included in the output quantization noise analytical expression. The model has been evaluated and compared in terms of accuracy and computing time for different applications such as, Least Mean Square (LMS) or Affine Projection Algorithms (APA). This approach leads to accurate noise power estimations. Model execution times have been evaluated on the Matlab tool. The linear prediction approach reduces dramatically the noise power expression computing time.
The output noise analytical expression is used in the floating-point to fixed-point conversion process to optimize the data word-length under accuracy constraint (SQNR minimal value). The optimization time obtained with our approach is better than that obtained with fixed-point simulation based approach after only several iterations. Our approach reduces computing time compared to simulation approaches after only some iterations. These results show the interest of our methodology to reduce fixed-point system development time. They are described in details in the Ph.D. thesis of Romuald Rocher[14] .
A method has been proposed to define the accuracy constraint (SQNR minimal value) according to the application performance constraints. The SQNR minimal is obtained with a floating-point simulation. The error due to the fixed-point conversion is modelled by a single noise source located at the system output. This noise source power is increased as long as the application performances are not modified. The noise model has been defined and validated through different experiments. Our approach to determine the accuracy constraint has been tested and validated on two applications corresponding to a MP3 coder and a WCDMA receiver.
Specialized SoC architecture modeling
System modelling for dynamically reconfigurable architectures
Participants : Imène Benkermi, Didier Demigny, Daniel Chillet, Sébastien Pillement, Olivier Sentieys.
SoC platforms including dynamically reconfigurable units aim at supporting complex multimedia applications using a real-time operating system. They consist in different execution modules, i.e. general-purpose processor(s) and specialized/reconfigurable accelerators including the DART dynamically reconfigurable architecture. Their heterogeneity led to a specification by means of the three following levels of description: software, middleware and hardware. Each level corresponds to a task configuration on the platform. Hence, the operating system has to ensure specific services imposed by the reconfigurable aspect; namely, ensuring task communication and migration between the three levels of description.
The software model describes the different tasks of the application supported by the architecture platform. A UML-based model has been used. In addition to the task characteristics, real-time constraints and links between tasks, this model permits to specify the different forms a task can have depending on the target it will eventually execute on. This work has been done through a collaboration with ETIS (Cergy-Pontoise) and LESTER (Lorient) laboratories and it will continue through the OverSoC ANR project.
Another important issue for dynamically reconfigurable SoC platforms is the scheduling and binding of tasks on a heterogeneous architecture, eventually at run-time. We have proposed an approximate on-line scheduling algorithm based on Artificial Neural Networks (ANN), and more precisely on the Hopfield model. This scheduler is able to distribute the task set on different computing units, while meeting their real-time constraints and taking into account their heterogeneity; i.e. one task may have different execution times depending on the unit it will execute on. To take into account all of the constraints, we introduce a new design rule to built the neural network. A mathematical formulation of this new design rule has been done and a simulation tool has been developed. We have shown that correct schedule can be obtained with a small number of iterations. We will then study a hardware implementation of the neural network on a reconfigurable structure to obtain an efficient and reactive on-line scheduling.
SoC modeling and prototyping on FPGA-based systems
Participant : François Charot.
R2D2 participates in the SocLib initiative (http://soclib.lip6.fr ), whose goal is to build an open platform for modeling and simulation of multi-processors system on chip. The core of the platform is a library of simulation models for virtual components (IP cores), with a guaranteed path to silicon.
Thanks to the increasing capacity of FPGA components, it is today possible to integrate a significant number of processors in a programmable FPGA component. Although it is possible to synthesize a processor core in such a component, the FPGA manufacturers design their own processor architectures. As an example, Altera propose the Nios processor core, declined in three families (economic, standard, fast). It is configurable processor core (pipeline depth, cache size, custom instructions, etc.).
As part of our participation to this initiative, we study the feasibility of prototyping SocLib platforms using FPGA components. A cycle-accurate and bit-accurate (CABA) model of the Nios processor core from Altera has been designed. In the near future, it will be integrated in the SocLib platform which will be developed in an ANR RNTL project. This work is done with the goal to establish a link between a SocLib simulation platform and its prototyping on a FPGA system.