## Section: New Results

### Dynamically and Heterogeneous Reconfigurable Platforms

#### New Reconfigurable Architectures

##### Power models of reconfigurable architectures

Participants : Robin Bonamy, Daniel Chillet, Olivier Sentieys.

Including a reconfigurable area in complex systems-on-chip is now considered as an interesting solution to reduce the area of the global system and to support high performances. But the key challenge in the context of embedded systems is currently the power budget of the system, and the designer needs some early estimations of the power consumption of its system. Power estimation for reconfigurable systems is a difficult problem because several parameters need to be taken into account to define an accurate model.

Hardware implementation of an algorithm provides different choices to the designer compared to software implementation. It is possible to vary the parallelism level or loop unrolling index, which has a direct impact on area and execution time and therefore on power and energy consumption. First we evaluated delay, area, power and energy impacts of loop transformations using High Level Synthesis tools. We have made several power measurements on a real FPGA platform and for different task implementations in order to build a model of energy consumption versus execution time. Work is in progress to also characterize energy consumption of tasks through extracting the number of elementary operators used in the hardware implemented task.

Furthermore, we also consider the opportunity of the dynamic reconfiguration, which makes possible to partially reconfigure a specific part of the circuit while the rest of the system is running. This opportunity has two main effects on power consumption. First, thanks to the area sharing ability, the global size of the device can be reduced and the static (leakage) power consumption can thus be reduced. Secondly, it is possible to delete the configuration of a part of the device which reduces the dynamic power consumption when a task is no longer used. Although the cost of the reconfiguration is still important, in some cases this technique can be interesting to reduce the power of the system. To evaluate the potential gain of the dynamic reconfiguration, we have made some measurements on a Virtex 5 board. We have defined a first model of the power consumption of the reconfiguration. This model shows that the power consumption not only depends on the bitstream file size but also on the content of the reconfiguration region [41] , [42] .

These experiments allow us to define energy and delay models that will be used by the operating system including a power management strategy to decide on-line which task instances must be executed to efficiently manage the available power using dynamic partial reconfiguration [82] .

##### High-level modeling of reconfigurable architectures

Participants : Robin Bonamy, Daniel Chillet, Sébastien Pillement.

To help System-on-Chip designers to explore the large design space, high-level methodologies and tools are more and more often required. The exploration phase is particularly difficult when the system must satisfy a large number of constraints, like performance, real time and power consumption. If the classical multiprocessor system-on-chips can be modeled without any difficulty, dynamically reconfigurable embedded accelerators are not correctly covered by the usual modeling languages.

In this context, we have extended the AADL (Architecture Analysis and Design Language) language to include the reconfiguration aspect included in nowadays' MPSoC [19] , [40] . This work is part of a more general project, Open-People, which proposes complete methodology for power and energy consumption analysis. The proposal is based on AADL property extensions which are applied on component models. A three-level model has been defined for every targeted FPGA. The first level defines a generic FPGA which allows to model every possible FPGA. The second level allows the specialization of the FPGA for a specific family. Finally, the third level provides the support to describe the deployment of an application on a specific FPGA circuit.

To complete these levels of description, we started the development of techniques for constraint verifications. These developments are based on the OCL language, which allows to extract characteristics on the AADL model, compute mathematical expressions and finally verify mathematical constraints. These verifications have been developed for power and energy consumption, they include static and dynamic power estimation and soon the power consumption during the dynamic reconfiguration process.

##### Reconfiguration controller

Participants : Manh Pham, Daniel Chillet, Sébastien Pillement.

Dynamically reconfigurable architectures, which can offer high performance, are increasingly used in different domains. Unfortunately, lots of applications cannot benefit from this new paradigm due to large timing overhead. Even for partial reconfiguration, modifying a small region of an FPGA takes few ms. To cope with this problem we have developed an ultra-fast power-aware reconfiguration controller (UPaRC) to boost the reconfiguration throughput up to 1.433 GB/s. UPaRC can not only enhance the system performance, but also auto-adapt to various performance and consumption conditions. This could enlarge the range of supported applications and can optimize power-timing trade-off of reconfiguration phase for each selected application during run-time. The energy-efficiency of UPaRC over state-of-the-art reconfiguration controllers is up to 45 times more efficient. This work has been accepted for publication in DATE'2012 [56] .

#### Management of Dynamically Reconfigurable Systems

##### Spatio-Temporal Scheduling based on Artificial Neural Networks

Participants : Antoine Eiche, Daniel Chillet, Sébastien Pillement, Olivier Sentieys.

Management of task execution on dynamically reconfigurable accelerators is known to be a difficult problem due to the large number of possibilities of task instantiations. The problem to solve can be defined as the ”spatio-temporal task scheduling”. The problem becomes even more difficult to solve when the solutions must be produced during the execution of the application, i.e. on-line. In this context, new algorithms must be defined and, to solve this problem, we propose to define a neural network based on the Hopfield model.

We are therefore able to address heterogeneous multiprocessor systems and to manage the reconfigurable resources embedded within MPSoC [23] . Our latest works on this topic focused on two different issues. First we demonstrated that neural network structures used for task scheduling can continue to produce valid solutions even if one or several neurons are in fault [70] . This characteristic is very important for present and future technologies for which the fabrication process variability can lead to increase the number of defaults in the circuit. The second focus concerned the optimization of the neural network convergence by using parallel evaluation of neurons. We have shown how to define several neuron packets (from the neural network) that can be evaluated in parallel without modifying the convergence property [44] , [76] .

##### Flexible Communication OS Service

Participants : Daniel Chillet, Sébastien Pillement, Ludovic Devaux.

In a multiprocessor system, to gain the advantages of parallelism, efficient communication and memory management are highly required. Recent developments in the partial and dynamic reconfigurable computing domain demand better ways to manage simultaneous task execution. But, the requirements are slightly different from the traditional software based systems. In this context, Operating System (OS) services like scheduling, placement, inter-task communication have been developed to make such platforms flexible and self-sufficient. For task communications within flexible architectures, we defined a specific network-on-chip adapted to dynamically and partially reconfigurable resources included into modern SoC. The characterization of the DRAFT network was completed and its integration inside reconfigurable systems on chip was realized [14] . We then focused on the run-time communication service [50] and dynamic memory management [49] in reconfigurable System-on-Chips (RSoCs). We first developed a hardware communication block and the communication schemes supported by this new OS service. The originality relies on the implementation of this services directly inside the FPGA. We then demonstrated the requirements and advantages of having a local memory task or a dynamically configurable memory task, in order to improve effectiveness and efficiency of the proposed schemes.

#### Fault-Tolerant Reconfigurable Architectures

Participants : Sébastien Pillement, Manh Pham, Stanislaw Piestrak.

In terms of complex systems implementation, reconfigurable FPGA circuits are now part of the mainstream thanks to their flexibility, performances and high number of integrated resources. FPGAs enter new fields of applications such as aeronautics, military, automotive or confined control thanks to their ability to be remotely updated. However, these fields of applications correspond to harsh environments (cosmic radiation, ionizing, electromagnetic noise) and with high fault-tolerance requirements. We then propose a complete framework to design reconfigurable architecture supporting fault-tolerance mitigation scheme [58] . The proposed framework allows simulation, validation of mitigation operations, but also to size architecture resources. The physical implementation of the fault-tolerant reconfigurable platform permits to validate the proposed model and the effectiveness of the framework. This implementation shows the potential of dynamically reconfigurable architectures for supporting fault-tolerance in embedded systems. We also worked on new approach in order to include dependability in the DRAFT coarse-grained reconfigurable architecture [37] .

#### Low-Power Architectures

##### Ultra Low-Power Architecture for Control-Oriented Applications in Wireless Sensor Nodes

Participants : Olivier Sentieys, Steven Derrien, Vivek D. Tovinakere, Philippe Quémerais, Romain Fontaine.

This research work aims at developing ultra low-power SoC for wireless sensor nodes, as an alternative to existing approaches based low-power micro-controllers. The proposed approach reduces the power consumption by using a combination of hardware specialization and power gating techniques. In particular, we use the fact that typical WSN applications are generally modeled as a set of small to medium grain tasks that are implemented on low power microcontroller using light weight thread-like OS constructs.

Rather than implementing these tasks in software, we instead propose to map each of these tasks to their own specialized hardware structures that we call a hardware micro-task. Such hardware task consists of a minimalistic (and customized) data-path controlled by a finite state machine (FSM). By customizing each of these hardware implementations to their corresponding task, we expect to significantly reduce the dynamic power dissipated by the whole system. Besides, to circumvent the increase in static power caused by the possibly numerous hardware tasks implemented in the chip, we also propose to combine our approach with power gating, so as to supply power to a hardware task only when it needs to be executed [28] .

As a prof of concept, a chip has been designed and fabricated in a 65nm CMOS from STMicroelectonics using the CMP facilities. The area is about 1mm${}^{2}$ in a QFN52 package. The circuit is a controller part of a wireless sensor network node. It embeds an OpenMSP microcontroller core with SRAM memories for data and programs and some dedicated hardware tasks to control an external radio transceiver such as the TI CC2420 commonly used in the industry.

To reduce energy consumption, low power design techniques such as power gating were used. Two power domains are implemented: one is dedicated to microcontroller and memories, while the goal of the second is to measure the efficiency of our hardware micro-task concept.

The input-output ring around the core is divided into three parts: two parts are digital I/O pads corresponding to a power domain and the third contains analog pads to control the power gating for monitoring. Our goal is to analyze the power benefits of our approach and to compare it with classical microprocessor architectures.

Figure 4. Layout and Die of Express-Chip Test Chip
##### Wakeup Time and Wakeup Energy Estimation in Power-Gated Logic Clusters

Participants : Olivier Sentieys, Vivek D. Tovinakere.

Run-time power gating for aggressive leakage reduction has brought into focus the cost of mode transition overheads due to frequent switching between sleep and active modes of circuit operation. In order to design circuits for effective power gating, logic circuits must be characterized for overheads they present during mode transitions. We have proposed a method to determine steady-state virtual-supply voltage in active mode and hence present a model for virtual-supply voltage in terms of basic circuit parameters. Further, we derived expressions for estimation of two mode transition overheads: wakeup time and wakeup energy for a power-gated logic cluster using the proposed model. Experimental results of application of the model to ISCAS85 benchmark circuits show that wakeup time may be estimated within an average error of $16.3%$ across $22×$ variation in sleep transistor sizes and $13×$ variation in circuit sizes with significant speedup in computation time compared to SPICE level circuit simulations [30] , [63] .

#### Arithmetic Operators for Cryptography

Participants : Arnaud Tisserand, Thomas Chabrier, Danuta Pamula.

##### ECC Processor with Protections Against SCA

A dedicated processor for elliptic curve cryptography (ECC) is under development. Functional units for arithmetic operations in ${𝔽}_{{2}^{m}}$ and ${𝔽}_{p}$ finite fields and 160–600-bit operands have been developed for FPGA implementation. Several protection methods against side channel attacks (SCA) have been studied. The use of some number systems, especially very redundant ones, allows to change the way some computations are performed and then their effects on side channel traces.

We propose in [83] hardware implementation of the double base number system (DBNS) random recoding of secret keys. This recoding is performed on-the-fly during the elliptic curve cryptograpy (ECC) scalar multiplication $\left[k\right]P$. This leads random behavior of the point operations at the side channel level.

We started a collaboration with the University of Sfax in Tunisia on the use of ECC processor for secure communications in low-cost wireless applications. A first FPGA implementation is under development and we expect to submit our first results in 2012.

##### Arithmetic Operators for High-Performance Cryptography

In [32] , we published an extended version of the work started in 2010 on fast algorithms and implementations of ${𝔽}_{{2}^{m}}$ finite field multiplication units in FPGA. The proposed and compared methods are based on separated multiplication and reduction steps and analyzed various area and time dependency/efficiency/complexity tradeoffs.

With Mark Hamilton, PhD student in the Code and Crypto group from the University College Cork (UCC), we worked on fast algorithms and implementations of ${𝔽}_{p}$ finite field multipliers for some specific values of $p$. The corresponding results have been published in [46] .

#### SoC Modeling and Prototyping on FPGA-based Systems

Participants : François Charot, Kevin Martin, Laurent Perraudeau, Charles Wagner.

SoCLib and MutekH are two software development projects to which we contribute. SoCLib (http://www.soclib.fr ) is an open platform for modeling and simulation of multiprocessors system-on-chip (MP-SoC). MutekH (http://www.mutekh.org ) is a free and portable operating system for embedded platforms, ranging from micro-controller to multiprocessor systems. The use of the configurable and extensible simulation model of the Altera NIOSII processor of the SoCLib library and of the MutekH operating system allows us to easily deploy software applications such as codes from MediaBench, MiBench and Cryptographic Library benchmark sets or multithreaded applications on monoprocessor and multiprocessor simulation platforms. These platforms are used on the one hand for the validation of the processor extensions automatically generated by our compilation tools and on the other hand for the measurement of the speedup achieved using these new extensions.