Team Cairn

Overall Objectives
Scientific Foundations
Application Domains
New Results
Contracts and Grants with Industry
Other Grants and Activities

Section: New Results

Dynamically and Heterogeneous Reconfigurable Platforms

New Reconfigurable Architectures

High-level modeling of reconfigurable architectures

Participants : Robin Bonamy, Daniel Chillet, Sébastien Pillement.

The evolution of application complexity and System-on-Chip architectures places the designer of embedded systems in front of a very large design space. Exploring the design space to reach an efficient solution becomes very difficult, especially when the design must satisfy a large number of constraints. This problem becomes even more difficult when the system includes reconfigurable area to support the flexibility of the application. To help the designer in exploring the design space of its design, it becomes more and more important to provide methods and tools for early estimations of the system's characteristics (performance, power). Several methods and tools have been developed for that, but none of them proposes to model the reconfiguration of System-on-Chips. In this context, we developed a high-level model of reconfigurable circuits, like FPGAs. This model is based on the AADL (Architecture Analysis and Design Language) language. This work is part of a more general project, Open-PEOPLE, whose main goal is to define a complete exploration flow based on AADL allowing efficient power and energy consumption analysis. This model will be used to define some exploration strategies in order to provide earlier estimations of performance, area, energy and power consumption.

Power models of reconfigurable architectures

Participants : Robin Bonamy, Daniel Chillet, Olivier Sentieys.

Including a reconfigurable area in complex systems-on-chip is now considered as an interesting solution to reduce the area of the global system and to support high performances. But the key challenge in the context of embedded systems is currently the power budget of the system, and the designer needs some early estimation of the power consumption of its system. Power estimations for reconfigurable systems is a difficult problem because several parameters need to be taken into account to define an accurate model. Our first work on this subject consists in evaluating delay, area, power and energy impacts of loop transformations. We have made several power measurements on a real FPGA platform and for different task implementations. These experiments allow us to define an energy and delay model which will be used by the operating system to decide on-line which task instances must be executed to efficiently manage the available power.

Furthermore, we also consider the opportunity of the dynamic reconfiguration for the energy consumption. Indeed, using dynamic reconfiguration, it is now possible to partially reconfigure a specific part of the circuit while the rest of the system is running. The cost of the reconfiguration is still important, but some cases this technique can be interesting to reduce the power of the system. To evaluate the potential gain of the dynamic reconfiguration, we have made some measurements on a Virtex 5 board. We have defined a first model of the power consumption of the reconfiguration. This model shows that the power consumption mainly depends on the bitstream file size. This model will be also included in the power management strategy of the operating system for the same goal, i.e. ensuring an efficient management of the available power.

Flexible Arithmetic Operator Design

Participants : Emmanuel Casseau, Daniel Ménard, Shafqat Khan.

Our aim is to propose new arithmetic operators which are flexible in term of both computation and data size. Targeted applications are multimedia processing. Such processing handles low precision data (typically, pixels are codes using 8, 10, 12 and 16 bits). To optimize their implementation, architectures must offer operators which support different data word-lengths. Operator efficiency can be increased using subword parallelism (SWP) scheme. A single SWP instruction performs the same operation on multiple sets of subwords in parallel using SWP operators. In the existing SWP capable processors, the choices for subword data sizes are usually 8, 16, 32 bits etc. The reason behind the selection of these subword sizes being the less complexity of the SWP operator design especially when the subword sizes are multiple of the smallest subword size. However in multimedia applications, operators which can support multimedia oriented subword sizes (8, 10, 12 and 16) are required. Multimedia operations are based on basic operators (add, absolute value, multiply) but more complex operations are also required to increase both speed an efficiency. For instance Im1 ${\#8721 |a-b|}$ operation is required in the calculation of SAD, Im2 ${\#8721 (a×b)}$ operation is required for the multiplication-accumulation operation used in the DCT algorithm etc. To overcome the overheads of reconfigurations such as the complexity of the interconnection network and the reconfiguration time, we designed a flexible pipelined multimedia operator which provides reconfigurability inside the operator using a configurable datapath. The operator can be configured to perform most of multimedia operations on different data sizes without any need of reconfiguration time. This operator will be used as one computing unit inside a reconfigurable processor tailored for multimedia applications. The operator has been also design using redundant data representation [27] for high-speed processing.

Adaptive and Multi-mode Devices

Participants : Emmanuel Casseau, Antoine Floch, Erwan Raffin, Daniel Ménard, Shafqat Khan, François Charot, Christophe Wolinski.

More and more devices need to continuously adapt to changing environments that is to say devices will have to be flexible to implement different algorithms at different times. Such mode switches require more than just software based changes but also adaptation of the application specific hardware components. To issue this requirement, we investigate two ways. The first one is the design of a reconfigurable processor able to adapt its computing structure to a dedicated domain: video and image processing applications. The processor is built around a pipeline of coarse grain reconfigurable operators exhibiting a good trade-off between performance and power consumption. On the contrary of what has been done in previous reconfigurable processors, flexibility is not obtained through the use of a flexible interconnect network but on the use of configurable domain-dedicated units. This work is done in the context of the ROMA ANR project. We particularly investigate reconfigurable operator design [27] and compilation framework [69] . The second way is the synthesis of multi-mode architectures which do not lead to any reconfiguration time penalty. Such architectures implement all required operators according to the pre-defined set of computations to be performed. In order to optimize area, these operators are shared between the set of algorithms, and some control logic steers the data to operators depending on the particular algorithm to be executed at a specific time. The approach is based on high-level synthesis [29] . Syntheses can be constrained for performance or area and both ASIC and FPGA technologies can be targeted. Application domains are typically channel encoding, cryptography and multimedia. This work is done through a collaboration with IMS Lab. (B. Le Gal).

Arithmetic Operators for Cryptography

Participants : Arnaud Tisserand, Thomas Chabrier, Danuta Pamula, Stanislaw Piestrak, Andrianina Andriamanga.

ECC Processor with Protections Against SCA

A dedicated processor for elliptic curve cryptography (ECC) is under development. Functional units for arithmetic operations in Im3 $\#120125 _2^m$ and Im4 $\#120125 _p$ finite fields and 160–600-bit operands have been developed for FPGA implementation. Several protection methods against side channel attacks (SCA) have been studied. The use of some number systems, especially very redundant ones, allows to change the way some computations are performed and then their effects on side channel traces. In [36] we propose the use of the double base number system (DBNS) to randomly recode secret keys digits on-the-fly during the main ECC operation: the scalar multiplication [k]P . The proposed method, implemented on FPGAs, leads to a totally random behavior of the point operations at the side channel level, and with a speed equivalent to the best standard unprotected methods.

A long talk on Arithmetic Level Countermeasures for ECC Coprocessor [76] was presented at the Claude Shannon Institute Workshop on Coding and Cryptography in Cork, Ireland, May 2010.

Arithmetic Operators for High-Performance Cryptography

We worked on fast algorithms and implementations of Im3 $\#120125 _2^m$ finite field multiplication units in FPGA. We focused on methods based on separated multiplication and reduction steps and analyzed various area and time dependency/efficiency/complexity tradeoffs. The corresponding results have been presented in [55] . A journal version of this work has been accepted for future publication in a national Polish journal "Measurement Automation and Monitoring".

Mark Hamilton, PhD student in the Code and Crypto group from the University College Cork (UCC), spent five months at CAIRN-Lannion to work on fast algorithms and implementations of Im4 $\#120125 _p$ finite field multipliers for some specific values of p . A common publication is under preparation.

ECC Protections Against Fault Injection Attacks

During the Master internship of Andrianina Andriamanga, we worked on the use of residue code ( mod 2p-1 for some values of p ) detection methods to protect ECC operations against some fault injection attacks. The corresponding results will be submitted to a conference in the beginning of 2011.

Hardware Implementation of Code-Based Cryptography

A new collaboration with CASED (Center for Advanced Security Research Darmstadt) laboratory in Germany is starting on efficient hardware implementations of new cryptographic methods based on code theory. This type of cryptographic methods are robust against mathematical attacks using quantum computers.

Optimization of Advanced Arithmetic Operators

Participant : Arnaud Tisserand.

A software library in SystemC was developed for the optimization and validation of fixed-point hardware arithmetic operators. The corresponding results have been presented in [72] . We use an interface to the gappa software developed by G. Melquiond to tightly bound rounding errors and verify that those bounds are below some given threshold. A SystemC description of arithmetic operations is analyzed by gappa to certify the operator accuracy. We also provide various optimization methods to reduce the size of fixed-point operators under maximal-error constraints. This avoids to overestimate maximal rounding errors like in standard methods. The library can be used to perform architecture exploration with certified accuracy.

In the collaboration with the VLSI CAD laboratory from the University of Massachusetts (UMASS), started in 2009, we continue the integration of arithmetic methods for bounding rounding errors and optimizing some basic arithmetic operators in the TDS system developed at UMASS. A common publication in under preparation.

Management of Dynamically Reconfigurable Systems

Participants : Antoine Eiche, Daniel Chillet, Sébastien Pillement, Ludovic Devaux, Olivier Sentieys.

To support the dynamic behavior of new embedded applications, heterogeneous execution resources are often included in modern SoC or MPSoC (Multi-Processor System-on-Chip) systems. The management of these resources is classically supported by an operating system (OS) that includes several specific services. One new needed service concerns the task scheduling and placement within the reconfigurable resources. The classical temporal scheduling problem is then extended with a spatial dimension in order to manage the physical available area into the reconfigurable resource. The second impacted service is the task communication management. The on-line task placement makes the interconnection support difficult to predict. Then, a flexible and dynamically interconnect medium must be defined.

Spatio-Temporal Scheduling based on Artificial Neural Networks

Participants : Antoine Eiche, Daniel Chillet, Sébastien Pillement, Olivier Sentieys.

By including dynamic and partial reconfiguration paradigm into a System-on-Chip platform, some specific management services must be developed to support the parallel and/or sequential instantiations of multiple hardware tasks within the same piece of silicon. One of the main problem consists in defining the placement and the scheduling of the different tasks within the reconfigurable part of the system, this problem is generally called the spatio-temporal scheduling.

From our experience about neural networks for temporal task scheduling, we address now the problem of task placement within a reconfigurable resource. This work considers a heterogeneous reconfigurable area where several instances of task are defined. Our solution is based on a neural network structure specifically designed to optimize the task placement problem. The main objective of the optimization is to consider the reduction of the task rejection. Our placement policy has been compared to other propositions and provides better results under identical assumptions [45] . We also have continued our work about the hardware implementation of our neural network. The temporal scheduling is now completely defined and we plan to develop the hardware implementation of the spatial scheduling for our future works.

Flexible Communication Infrastructure

Participants : Daniel Chillet, Sébastien Pillement, Ludovic Devaux.

For task communications within flexible architectures, we defined a specific interconnection architecture adapted to dynamically and partially reconfigurable resources included into modern SoC. The characterization of the DRAFT network was completed and its integration inside reconfigurable systems on chip was realized [35] . In the framework of the FosFor project, a bridge was designed to allow the interconnection of draft to an AHB bus enabling communications with off-the-shell processors like Leon3. An interconnection service, that will be part of the FosFor operating system, was also specified [43] . This service manages the communications and offers new strategies of communications allowed by dynamic reconfiguration like the creation of dynamic memory spaces instantiating temporary memory tasks in unused logic areas. Considering possibilities offered by dynamic reconfiguration, a new network was designed and characterized. R2NoC has the particularity to present reconfigurable routers containing only communication links [44] . However, limitations imposed by industrial products avoid an efficient use of this network due to very long reconfiguration times. That is why the Ocean network is currently being designed.

Fault-Tolerant Reconfigurable Systems

Participants : Stanislaw Piestrak, Sébastien Pillement, Manh Pham, Olivier Sentieys.

The use of reconfigurable hardware in critical applications like transportation and transaction systems is increasing rapidly. Undetected errors caused e.g. by radiation may result in fatal silent data corruption and unreproducible system crashes. Since it is virtually impossible to build devices which are free from faults, it is essential to embed some sort of fault-tolerance in such devices, which will enable them to work correctly even in the presence of faults. Since the past decade, a lot of research has been done to develop fault-tolerant reconfigurable systems on various granularity levels, although most of them have dealt with the lowest level such as offered by FPGAs. In [49] , we have considered the possibility of implementing low-cost hardware techniques which would allow to tolerate temporary faults in the data-paths of coarse-grained reconfigurable architectures. Our goal was to use less hardware overhead than commonly used duplication or triplication methods. The proposed technique relies on concurrent error detection by using residue code modulo 3 and re-execution of the last operation, once an error is detected. Simulation results performed for the DART architecture developed at IRISA with all of its data-paths protected using residue code confirmed hardware savings of the proposed approach over duplication. Moreover, we also have studied different strategies for fault recovery after detection.

The pervasiveness of electronic computers has led the automotive industry to face new security and performance requirements to integrate new applications in the field. Modern reconfigurable logic circuits meet now the requirements of processing performance, flexibility and industry trends on reducing product cost. We show in this work the importance of new dynamically reconfigurable architectures in the automotive field and more generally in the area of dependability. The use of dynamically reconfigurable computers can reduce the number of computers and reduce the costs of implementation. Unfortunately, these architectures are very sensitive to radiation and therefore to errors. During this year we have enhanced our FT-DYMPSoC system [18] and fully implemented it on a commercial FPGA circuit. In order to cope with dynamic behaviors, we have proposed a NoC based version of the system [64] . During this process of implementation, we encounter some problems due to the lacks in effective method to estimate the impacts of fault mitigation schemes on the system performance. Thus, we have defined an analytical model is proposed to ease the evaluation of performance/reliability trade-off while including fault-tolerance technique into the target systems [65] . Duplex system is more appreciated than a triple redundancy system in term of required hardware overhead. On the contrary, duplex system lacks the fault identification localization, and hence correction capabilities which are present in triplication system. We then have proposed an amelioration of existing fault-tolerant schemes based on duplication by using dynamically reconfigurable architectures. For that purpose we have designed a low overhead softcore processor system based on lockstep scheme. The lockstep system contains a duplex copy of processor which is able to detect errors in the dual processor thanks to a mismatch indicator. Our proposal enhance the lockstep scheme by adding the fault identification capability. A proposed configuration engine supervises the system in back-ground. The fault localization action detects which processor within the duplex copy is defected by error. Afterwards the correct output of the fault-free processor is instantly switched to the final output. That prevents the erroneous results from being introduced to the environment and thus avoids any potential catastrophic results propagation. The operation disruption due to fault occurrence is minimized offering a big advantage to the system safety. Moreover the generality of the proposed configuration engine do not prevent them from being implemented in diverse types of systems.

Low-Power Architectures

Coding Techniques Improving Reliability and Power Consumption for On-Chip Buses

Participants : Olivier Sentieys, Sébastien Pillement, Stanislaw Piestrak.

Interconnects are now considered as one of the bottlenecks in the design of system-on-chip (SoC) since they introduce delay and power consumption. To deal with this issue, data coding for interconnect power and timing optimization has been introduced.

Several coding techniques have been suggested to reduce both noise and wire power consumption in on-chip interconnections, like bus-invert coding, low-weight coding, and reduction of the voltage swing of the signal on the wire. Unfortunately, the latter involves reduced noise margin which might result in increased error rate. Recently, Berger-invert code has been suggested to protect communication channels against all asymmetric errors and to decrease power consumption. We have not only shown some inaccuracies of the approach proposed [30] , but also suggested a modified encoding scheme and a new design of codec [31] . Implementation results have shown that our approach leads to significant hardware savings and results in reduced error rate and power consumption.

Ultra Low-Power Architecture for Control-Oriented Applications in Wireless Sensor Nodes

Participants : Steven Derrien, Adeel Pasha, Olivier Sentieys.

This research work aims at developing ultra low-power SoC for wireless sensor nodes, as an alternative to existing approaches based low-power micro-controllers such as the Texas Instrument's MSP430. The proposed approach reduces the power consumption by using a combination of hardware specialization and power gating techniques. In particular, we use the fact that typical WSN applications are generally modeled as a set of small to medium grain tasks that are implemented on low power microcontroller using light weight thread-like OS constructs.

Rather than implementing these tasks in software, we instead propose to map each of these tasks to their own specialized hardware structures that we call a hardware micro-task. Such hardware task consists of a minimalistic (and customized) data-path controlled by a finite state machine (FSM). By customizing each of these hardware implementations to their corresponding task, we expect to significantly reduce the dynamic power dissipated by the whole system. Besides, to circumvent the increase in static power caused by the possibly numerous hardware tasks implemented in the chip, we also propose to combine our approach with power gating, so as to supply power to a hardware task only when it needs to be executed. The results obtained are very promising and have led us to a publication at IEEE/ACM Design Automtation Conference [61] . Our work was also described in the article Embedded systems power down of the magazine by citing our presentation at DAC and the fact that microtasking=low power. See for the article.

The work done in 2010 mainly consisted in finalizing the system-level design-flow for the synthesis of ultra low-power WSN node controllers. In particular, we completed the design-flow for hardware micro-tasks from a higher level description in ANSI-C. We have also developed a Domain Specific Language (DSL) that is used to specify the system-level model of a WSN node controller. This system-level model consists in the notion of micro-tasks, their interaction through the generated events, their hierarchies and priorities, and their shared resources. Using all this information, our design-flow is able to generate the VHDL description of a hardware System Monitor (SM) that is used to control the hardware micro-task and the shared resource activation and deactivation. To summarize, the whole design-flow is comprised of two parts (i) a C to VHDL flow for hardware micro-task synthesis, and (ii) a DSL to VHDL flow for hardware system monitor synthesis.

SoC Modeling and Prototyping on FPGA-based Systems

Participants : François Charot, Kevin Martin, Laurent Perraudeau, Charles Wagner.

Cairn participates in the SoCLib ANR project (see Section 7.8 for more information) whose goal is to build an open platform for modeling and simulation of multiprocessors system-on-chip (MP-SoC). As part of our participation in this project, we have developed simulation models of the Altera NIOSII processor and of the Altera interconnect (Avalon bus). These models and their associated wrappers now allow NIOSII(The NiosII processor core is a configurable processor core proposed by Altera. This NiosII processor core is declined in three families (economic, standard, fast). A SoCLib model of the fast version has been previously developed in 2008.)-based multiprocessor systems to be modeled.

A multithreaded version of a H264 video decoder has been deployed on a NIOSII-based multiprocessor SoCLib platform thanks to the use of the MutekH operating system developed at LIP6 laboratory. MutekH is a set of libraries built on top of the Hexo exo-kernel which defines the Hardware Abstraction Layer, providing both portability and support for heterogeneity. In the framework of this SoCLib project, we have ported Hexo on NIOSII processor based MPSoCs architectures modeled with SoCLib. This NIOSII processor port is integrated to the MutekH distribution ( ).


Logo Inria