Activity Report 2014

Project-Team CAIRN

Energy Efficient Computing Architectures

IN COLLABORATION WITH: Institut de recherche en informatique et systèmes aléatoires (IRISA)
# Table of contents

1. Members ................................................. 1
2. Overall Objectives ..................................... 2
3. Research Program .................................... 4
   3.1. Panorama ........................................ 4
   3.2. Reconfigurable Architecture Design .......... 5
   3.3. Compilation and Synthesis for Reconfigurable Platforms 6
   3.4. Interaction between Algorithms and Architectures 6
4. Application Domains ................................ 7
   4.1. Panorama ........................................ 7
   4.2. 4G Wireless Communication Systems .......... 8
   4.3. Wireless Sensor Networks ...................... 8
   4.4. Multimedia processing .......................... 8
5. New Software and Platforms ....................... 8
   5.1. Panorama ........................................ 8
   5.2. Gecos .......................................... 10
   5.3. ID.Fix: Infrastructure for the Design of Fixed-point Systems 10
   5.4. UPaK: Abstract Unified Pattern-Based Synthesis Kernel for Hardware and Software Systems 10
   5.5. DURASE: Automatic Synthesis of Application-Specific Processor Extensions 11
   5.6. PowWow: Power Optimized Hardware and Software FrameWork for Wireless Motes (AP-L-10-01) 11
6. New Results ............................................ 13
   6.1. Highlights of the Year .......................... 13
   6.2. Reconfigurable Architecture Design .......... 13
      6.2.1. Dynamic reconfiguration support in FPGA 13
      6.2.2. Power Models of Reconfigurable Architectures 15
      6.2.3. Real-time Spatio-Temporal Task Scheduling on 3D Architecture 15
      6.2.4. Run-time Task Management to Increase Resource Utilisation for Concurrent Critical Tasks in Mixed-Critical Systems 16
      6.2.5. Arithmetic Operators for Cryptography and Fault-Tolerance 16
   6.3. Compilation and Synthesis for Reconfigurable Platform 18
      6.3.1. Numerical Accuracy Analysis and Optimization 18
      6.3.2. Reconfigurable Processor Extension Generation 18
      6.3.3. Optimization of Loop Kernels Using Software and Memory Information 19
      6.3.4. Design Tools for Reconfigurable Video Coding 19
      6.3.5. A Domain Specific Language for Rapid Prototyping of Software Radio Waveforms 20
   6.4. Interaction between Algorithms and Architectures 20
      6.4.1. Cooperative-cum-Constrained Maximum Likelihood Algorithm for UWB-based Localization in Wireless BANs 20
      6.4.2. MIMO Systems and Cooperative Strategies for Low-Energy Wireless Networks 20
      6.4.3. Adaptive protocols for Wireless Sensor Networks 21
      6.4.4. Energy Harvesting and Power Management 21
      6.4.5. Multimedia Processing ........................ 22
      6.4.6. Non-Intrusive Load Monitoring ......... 22
7. Bilateral Contracts and Grants with Industry .... 23
8. Partnerships and Cooperations ..................... 23
   8.1. Regional Initiatives ............................ 23
   8.2. National Initiatives ............................. 24
      8.2.1. ANR Blanc - PAVOIS (2012–2016) ........ 24
8.2.2. ANR INFRA 2011 - FAON (2012-2015) 24
8.2.3. Equipex FIT - Future Internet (of Things) 24
8.2.4. ANR Ingénierie Numérique et Sécurité - ARDyT (2011-2015) 25
8.2.5. ANR Ingénierie Numérique et Sécurité - COMPA (2011-2015) 25
8.2.6. ANR Ingénierie Numérique et Sécurité - DEFIS (2011-2015) 25
8.2.7. Labex CominLabs - BoWI (2014-2018) 25
8.2.9. Labex CominLabs - RELIASIC (2014-2018) 27
8.3. European Initiatives 27
8.3.1. FP7 FLEXTILES 27
8.3.2. FP7 ALMA 28
8.4. International Initiatives 28
8.4.1. Inria Associate Teams 28
8.4.2. Inria International Partners 29
  8.4.2.1. Declared Inria International Partners 29
  8.4.2.2. Informal International Partners 29
8.4.3. Participation In other International Programs 29
8.5. International Research Visitors 30
8.5.1. Visits of International Scientists 30
8.5.2. Visits to International Teams 30
9. Dissemination ................................................................. 30
  9.1. Promoting Scientific Activities 30
  9.1.1. Scientific Events Selection 30
  9.1.1.1. Responsible of Conference Program Committee 30
  9.1.1.2. Member of Conference Program Committee 30
  9.1.2. Journal 31
  9.1.3. Other Scientific Responsibilities 31
9.2. Seminars and Invitations 31
9.3. Teaching - Supervision - Juries 31
  9.3.1. Teaching 31
  9.3.2. Teaching Responsibilities 33
  9.3.3. Supervision 34
10. Bibliography ................................................................. 35
Project-Team CAIRN

Keywords: Hardware Accelerators, Compiling, Embedded Systems, Energy Consumption, Parallelism, Wireless Sensor Networks, Security, Signal Processing, Reconfigurable Hardware, Computer Arithmetic, System-On-Chip

CAIRN is a common project with CNRS, University of Rennes 1, and ENS Cachan-Antenne de Bretagne, and is located on two sites: Rennes and Lannion. The team has been created on January the 1st, 2008.

Creation of the Project-Team: 2009 January 01.

1. Members

Research Scientists
François Charot [Researcher (CR) Inria, Rennes]
Olivier Sentieys [Team Leader, Senior Researcher (DR) Inria, Lannion, HdR]
Arnaud Tisserand [Researcher (CR) CNRS, Lannion, HdR]

Faculty Members
Olivier Berder [Professor, Univ. Rennes I, IUT, Lannion, HdR]
Emmanuel Casseau [Professor, Univ. Rennes I, ENSSAT, Lannion, HdR]
Daniel Chillet [Associate Professor, Univ. Rennes I, ENSSAT, Lannion, HdR]
Antoine Courtay [Associate Professor, Univ. Rennes I, ENSSAT, Lannion]
Steven Derrien [Professor, Univ. Rennes I, ISTIC, Rennes, HdR]
Matthieu Gautier [Associate Professor, Univ. Rennes I, IUT, Lannion]
Cédric Killian [Associate Professor, Univ. Rennes I, IUT, Lannion]
Angeliki Kritikakou [Associate Professor, Univ. Rennes I, ISTIC, Rennes, from Sep 2014]
Patrice Quinton [Professor, Director of ENS Rennes, Rennes, HdR]
Romuald Rocher [Associate Professor, Univ. Rennes I, IUT, Lannion]
Pascal Scalart [Professor, Univ. Rennes I, ENSSAT, Lannion, HdR]
Baptiste Vrigneau [Associate Professor, Univ. Rennes I, IUT, Lannion]
Christophe Wolinski [Professor, Univ. Rennes I, Director of ESIR, Rennes, HdR]

Engineers
Philippe Quémerais [Research Engineer (half time), Univ. Rennes I, ENSSAT, Lannion]
Arnaud Carer [Research Engineer, Univ. Rennes I, Lannion]
Raphaël Bardoux [Inria, Embrace Project, Lannion]
Nicolas Simon [Univ. Rennes I, Defis Project, Lannion]
Antoine Morvan [Univ. Rennes I, until Nov 2014]
Robin Bonamy [Univ. Rennes I, until May 2014]
Thomas Chabrier [CNRS, until Apr 2014]
Quang Hoa Le [Univ. Rennes I, until Dec 2014]

PhD Students
Franck Bucheron [DGA, Rennes]
Quang-Hai Khuat [University grant, Brittany Region/CG22, Lannion]
Ali Hassan El-Moussawi [University grant, FP7 Alma, Rennes]
Christophe Huriaux [MENRT grant, Lannion]
Jérémie Métairie [CNRS grant, ANR Pavois, Lannion]
Viet-Hoa Nguyen [University grant, BoWI project, Lannion]
Zhongwei Zheng [University grant, BoWI project, Lannion]
Gaël Deest [MENRT grant, Rennes, from Oct 2013]
Van-Thiep Nguyen [USTH grant, Lannion, from Oct 2013]
2. Overall Objectives

2.1. Overall Objectives

**Abstract:** The CAIRN project-team researches new architectures, algorithms and design methods for flexible and energy efficiency domain-specific system-on-chip (SoC). As performance and energy-efficiency requirements of SoCs are continuously increasing, they become difficult to fulfill using only programmable processors solutions. To address this issue, we promote/advocate the use of reconfigurable hardware, i.e., hardware structures whose organization may change before or even during execution. Such reconfigurable SoCs offer high performance at a low energy cost, while preserving a high-level of flexibility. The group studies these SoCs from three angles: (i) The invention and design of new reconfigurable platforms with an emphasis on flexible arithmetic operator design, dynamic reconfiguration management, and low-power consumption. (ii) The development of their corresponding design flows (compilation and synthesis tools) to enable their automatic design from high-level specifications. (iii) The interaction between algorithms and architectures especially for our main application domains (wireless communications, wireless sensor networks and digital security).
The scientific goal of the CAIRN group is to research new hardware architectures of Reconfigurable System-on-Chips (RSoC) along with their associated design flows. RSoCs chips integrate reconfigurable blocks whose hardware structure may be adjusted before or even during a program execution. They originate from the possibilities opened up by Field Programmable Gate Arrays (FPGA) technology and by reconfigurable processors [83], [93]. Recent evolutions in technology and modern hardware systems confirm that reconfigurable systems are increasingly used in recent applications or embedded into more general system-on-chip (SoC) [98]. This architectural model has received a lot of attention in academia over the last decade [88], and is now considered for industrial use. One reason is the rapidly changing standards in communications and information security that require frequent device modifications. In many cases, software updates are not sufficient to keep devices on the market, while hardware redesigns remain too expensive. The need to continuously adapt the system to changing environments (e.g., cognitive radio) is another incentive to use dynamic reconfiguration at runtime. Last, with technologies at 65 nm and below, manufacturing problems strongly influence electrical parameters of transistors, and transient errors caused by particles or radiations will also appear more and more often during execution: error detection and correction mechanisms or autonomic self-control can benefit from reconfiguration capabilities.

As chip density increases [110], power efficiency has become “the Grail” of all chip architects, be they designing circuits for portable devices or for high-performance general-purpose processors. Indeed, power (or energy) constraints are now as equally important as performance constraints. Moreover, this power issue can often only be addressed through the use of a complete application-specific architecture, or by incorporating some application-specific components within a programmable SoC. Designers hence face a very difficult choice between the flexibility and short design time of programmable architectures and the power efficiency of specialized architecture. In this context, reconfigurable architectures are acknowledged for providing the best trade-off between power, performance, cost and flexibility. This efficiency stems from the fact that their hardware structure can be adapted to the application requirements [109], [93].

However, designing reconfigurable systems poses several challenges: first, the definition of the architecture itself along with its dynamic reconfiguration capabilities, and then, its corresponding compilation/synthesis tools. The scientific goal of CAIRN is therefore to leverage the background and past experience of its members to tackle these challenges. We therefore propose to approach energy efficient reconfigurable architectures from three angles: (i) the invention of new reconfigurable platforms, (ii) the development of their corresponding design and compilation tools, and (iii) the exploration of the interaction between algorithms and architectures.

**Wireless Communication** is our privileged application domain and is built on our experience in 3G. Our research includes the prototyping of (subsets of) such applications on reconfigurable and programmable platforms. For this application domain, the high computational complexity of the Next-Generation (4G) Wireless Communication Systems calls for the design of highly specialized high-performance architectures. In **Wireless Sensor Networks** (WSN), where each wireless node is expected to operate without battery replacement for significant periods of time, energy consumption is the most important constraint. In this context, our research focuses on energy-efficient architectures and wireless cooperative techniques for WSN and wireless transmission in Intelligent Transportation Systems (ITS). Other important fields such as automotive, digital security and multimedia processing are also considered.

Members of the CAIRN team have/had collaborations with large companies like STMicroelectronics (Grenoble), Technicolor (Paris), Alcatel (Lannion), France-Telecom Orange Labs (Lannion), Atmel (Nantes), Xilinx (USA), SME like Geensys (Nantes), R-interface (Marseille), TeamCast/Ditocom (Rennes), Sensaris (Grenoble), Enivio (Rennes), InPixal (Rennes), Sestream (Paris), Ekinops (Lannion) and Institute like DGA (Rennes), CEA (Saclay, Grenoble). They are involved in several national or international funded projects (FP7 Alma, FP7 Flextiles, ANR funded Pavois, Ardyt, Defis, Faon, Compa, Ocelot, Cominlabs funded BoWI, 3DCore, HAH, Reliasic, and “Images&Networks Competitiveness Cluster” funded Embrace).
3. Research Program

3.1. Panorama

The development of complex applications is traditionally split in three stages: a theoretical study of the algorithms, an analysis of the target architecture and the implementation. When facing new emerging applications such as high-performance, low-power and low-cost mobile communication systems or smart sensor-based systems, it is mandatory to strengthen the design flow by a joint study of both algorithmic and architectural issues.

![Diagram of CAIRN's general design flow and related research themes]

Figure 1 shows the global design flow we propose to develop. This flow is organized in levels which refer to our three research themes: application optimization (new algorithms, fixed-point arithmetic and advanced representations of numbers), architecture optimization (reconfigurable and specialized hardware, application-specific processors), and stepwise refinement and code generation (code transformations, hardware synthesis, compilation).

In the rest of this part, we briefly describe the challenges concerning new reconfigurable platforms in Section 3.2, the issues on compiler and synthesis tools related to these platforms in Section 3.3, and the remaining challenges in algorithm architecture interaction in Section 3.4.

1Often referenced as algorithm-architecture mapping or interaction.
3.2. Reconfigurable Architecture Design

Over the last two decades, there has been a strong push of the research community to evolve static programmable processors into run-time dynamic and partial reconfigurable (DPR) architectures. Several research groups around the world have hence proposed reconfigurable hardware systems operating at various levels of granularity. For example, functional-level reconfiguration has been proposed to increase the efficiency of programmable processors without having to pay for the FPGA penalties. These coarse-grained reconfigurable architectures (CGRAs) provide operator-level configurable functional blocks and word-level datapaths. The main goal of this class of architectures is to provide flexibility while minimizing reconfiguration overhead (there exists several recent surveys on this topic [113], [97], [78], [114]). Compared to fine-grained architectures, CGRAs benefit from a massive reduction in configuration memory and configuration delay, as well as a considerable reduction in routing and placement complexity. This, in turns, results in an improvement in the computation volume over energy cost ratio, even if it comes at the price of a loss of flexibility compared to bit-level operations. Such constraints have been taken into account in the design of DART [93][12], CRIP [81], Adres [105] or others [116]. These works have led to commercial products such as the Extreme Processor Platform (XPP) [82] from PACT or Montium 2 from Recore systems.

Another strong trend is the design of hybrid architectures which combine standard GPP or DSP cores with arrays of configurable elements such as the Lx [96], or of field-configurable elements such as the Xirisc processor [103] and more recently by commercial platforms such as the Xilinx Zynq-7000. Some of their benefits are the following: functionality on demand (set-top boxes for digital TV equipped with decoding hardware on demand), acceleration on demand (coprocessors that accelerate computationally demanding applications in multimedia or communications applications), and shorter time-to-market (products that target ASIC platforms can be released earlier using reconfigurable hardware).

Dynamic reconfiguration enables an architecture to adapt itself to various incoming tasks. This requires complex resource management and control which can be provided as services by a real-time operating system (RTOS) [104]: communication, memory management, task scheduling [92], [85][1] and task placement. Such an Operating System (OS) based approach has many advantages: it provides a complete design framework, that is independent of the technology and of the underlying hardware architecture, helping to drastically reduce the full platform design time. Due to the unpredictable execution of tasks, the OS must be able to allocate resource to tasks at run-time along with mechanisms to support inter-task communication. An efficient way to support such communications is to resort to a network-on-chip [111]. The role of the communication infrastructure is then to support transactions between different components of the platform, either between macro-components – main processor, dedicated modules, dynamically reconfigurable component – or within the elements of the reconfigurable components themselves.

In CAIRN we mainly target reconfigurable system-on-chip (RSoC) defined as a set of computing and storing resources organized around a flexible interconnection network and integrated within a single silicon chip (or programmable chip such as FPGAs). The architecture is customized for an application domain, and the flexibility is provided by both hardware reconfiguration and software programmability. Computing resources are therefore highly heterogeneous and raise many issues that we discuss in the following:

- **Reconfigurable hardware blocks with a dynamic behavior** where reconfigurability can be achieved at the bit- or operator-level. Our research aims at defining new reconfigurable architectures including computing and memory resources. Since reconfiguration must happen as fast as possible (typically within a few cycles), reducing the configuration time overhead is also a key issue.

- **When performance and power consumption are major constraints**, it is acknowledged that optimized specialized hardware blocks (often called IPs for Intellectual Properties) are the best (and often the only) solution. Therefore, we also study architecture and tools for specialized hardware accelerators and for multi-mode components.

---

2 http://www.recoresystems.com/
Customized processors with a specialized instruction-set also offer a viable solution to trade between energy efficiency and flexibility. They are particularly relevant for modern FPGA platforms where many processor cores can be embedded. For this topic, we focus on the automatic generation of heterogeneous (sequential or parallel) reconfigurable processor extensions that are tightly coupled to processor cores.

3.3. Compilation and Synthesis for Reconfigurable Platforms

In spite of their advantages, reconfigurable architectures lack efficient and standardized compilation and design tools. As of today, this still makes the technology impractical for large scale industrial use. Generating and optimizing the mapping from high-level specifications to reconfigurable hardware platforms is therefore a key research issue, and the problem has received considerable interest over the last years [108], [84], [115], [118]. In the meantime, the complexity (and heterogeneity) of these platforms has also been increasing quite significantly, with complex heterogeneous multi-cores architectures becoming a de facto standard. As a consequence, the focus of designers is now geared toward optimizing overall system-level performance and efficiency [99], [108], [107]. Here again, existing tools are not well suited, as they fail at providing a unified programming view of the programmable and/or reconfigurable components implemented on the platform. In this context we have been pursuing our efforts to propose tools whose design principles are based on a tight coupling between the compiler and the target hardware architectures. We build on the expertise of the team members in High Level Synthesis (HLS) [8], ASIP optimizing compilers [15] and automatic parallelization for massively parallel specialized circuits [6]. We first study how to increase the efficiency of standard programmable processors by extending their instruction set to speed-up computationally-intensive kernels. Our focus is on efficient and exact algorithms for the identification, selection and scheduling of such instructions [9]. We also propose techniques to synthesize reconfigurable (or multi-mode) architectures. We address these challenges by borrowing techniques from high-level synthesis, optimizing compilers and automatic parallelization, especially when dealing with nested loop kernels. The goal is then either to derive a custom fine-grain parallel architecture and/or to derive the configuration of a Coarse Grain Reconfigurable Architecture (CGRA). In addition, and independently of the scientific challenges mentioned above, proposing such flows also poses significant software engineering issues. As a consequence, we also study how leading edge Object Oriented software engineering techniques (Model Driven Engineering) can help the Computer Aided Design (CAD) and optimizing compiler communities prototyping new research ideas. Efficient implementation of multimedia and signal processing applications (in software for DSP cores or as special-purpose hardware) often requires, for reasons related to cost, power consumption or silicon area constraints, the use of fixed-point arithmetic, whereas the algorithms are usually specified in floating-point arithmetic. Unfortunately, fixed-point conversion is very challenging and time-consuming, typically demanding up to 50% of the total design or implementation time [86]. Thus, tools are required to automate this conversion. For hardware or software implementation, the aim is to optimize the fixed-point specification. The implementation cost is minimized under a numerical accuracy or an application performance constraint. For DSP-software implementation, methodologies have been proposed [101], [106] to achieve a conversion leading to an ANSI-C code with integer data types. For hardware implementation, the best results are obtained when the word-length optimization process is coupled with the high-level synthesis [100], [89]. Evaluating the effects of finite precision is one of the major and often the most time consuming step while performing fixed-point refinement. Indeed, in the word-length optimization process, the numerical accuracy is evaluated as soon as a new word-length is tested, thus, several times per iteration of the optimization process. Classical approaches are based on fixed-point simulations [90], [112]. They lead to long evaluation times and cannot be used to explore the entire design space. Therefore, our aim is to propose closed-form expressions of errors due to fixed-point approximations that are used by a fast analytical framework for accuracy evaluation.

3.4. Interaction between Algorithms and Architectures

As CAIRN mainly targets domain-specific system-on-chip including reconfigurable capabilities, algorithmic-level optimizations have a great potential on the efficiency of the overall system. Based on the skills and
experiences in “signal processing and communications” of some CAIRN’s members, we conduct research on algorithmic optimization techniques under two main constraints: energy consumption and computation accuracy; and for two main application domains: fourth-generation (4G) mobile communications and wireless sensor networks (WSN). These application domains are very conducive to our research activities. The high complexity of the first one and the stringent power constraint of the second one, require the design of specific high-performance and energy-efficient SoCs. We also consider other applications such as video or bioinformatics, but this short state-of-the-art will be limited to wireless applications.

The radio in both transmit and receive modes consumes the bulk of the total power consumption of the system. Therefore, protocol optimization is one of the main sources of significant energy reduction to be able to achieve self-powered autonomous systems. Reducing power due to radio communications can be achieved by two complementary main objectives: (i) minimizing the output transmit power while maintaining sufficient wireless link quality and (ii) minimizing useless wake-up and channel hearing while still being reactive.

As the physical layer affects all higher layers in the protocol stack, it plays an important role in the energy-constrained design of WSNs. The question to answer can be summarized as: how much signal processing can be added to decrease the transmission energy (i.e. the output power level at the antenna) such that the global energy consumption be decreased? The temporal and spatial diversity of relay and multiple antenna techniques are very attractive due to their simplicity and their performance for wireless transmission over fading channels. Cooperative MIMO (multiple-input and multiple-output) techniques have been first studied in [94], [102] and have shown their efficiency in terms of energy consumption [91]. Our research aims at finding new energy-efficient cooperative protocols associating distributed MIMO with opportunistic and/or multiple relays and considering wireless channel impairments such as transmitters desynchronisation.

Another way to reduce the energy consumption consists in decreasing the radio activity, controlled by the medium access (MAC) layer protocols. In this regard, low duty-cycle protocols, such as preamble-sampling MAC protocols, are very efficient because they improve the lifetime of the network by reducing the unnecessary energy waste [80]. As the network parameters (data rate, topology, etc.) can vary, we propose new adaptive MAC protocols to avoid overhearing and idle listening.

Finally, MIMO precoding is now recognized as a very interesting technique to enhance the data rate in wireless systems, and is already used in Wi-Max standard (802.16e). This technique can also be used to reduce transmission energy for the same transmission reliability and the same throughput requirement. One of the most efficient precoders is based on the maximization of the minimum Euclidean distance ($\text{max}-d_{\text{min}}$) between two received data vectors [87], but it is difficult to define the closed-form of the optimized precoding matrix for large MIMO system with high-order modulations. Our goal is to derive new generic precoders with simple expressions depending only on the channel angle and the modulation order.

4. Application Domains

4.1. Panorama

keywords: telecommunications, wireless communications, wireless sensor networks, content-based image retrieval, video coding, intelligent transportation systems, automotive, security

Our research is based on realistic applications, in order to both discover the main needs created by these applications and to invent realistic and interesting solutions.

The high complexity of the Next-Generation (4G) Wireless Communication Systems leads to the design of real-time high-performance specific architectures. The study of these techniques is one of the main field of applications for our research, based on our experience on WCDMA for 3G implementation.

In Wireless Sensor Networks (WSN), where each wireless node has to operate without battery replacement for a long time, energy consumption is the most important constraint. In this domain, we mainly study energy-efficient architectures and wireless cooperative techniques for WSN.
**Intelligent Transportation Systems** (ITS), and especially Automotive Systems, more and more take advantage of information technology advances. While wireless transmissions allow a car to communicate with another one or even with road infrastructure, **automotive industry** can also propose driver assistance and more secure vehicles thanks to improvements in computation accuracy for embedded systems.

Other important fields will also be considered: hardware cryptographic and security modules, specialized hardware systems for the filtering of the network traffic at high-speed, high-speed true-random number generation for security, content-based image retrieval and video processing.

### 4.2. 4G Wireless Communication Systems

With the advent of the next generation (4G) broadband wireless communications, the combination of MIMO (Multiple-Input Multiple-Output) wireless technology with Multi-Carrier CDMA (MC-CDMA) has been recognized as one of the most promising techniques to support high data rate and high performance. Moreover, future mobile devices will have to propose interoperability between wireless communication standards (4G, WiMax ...) and then implement MIMO pre-coding, already used by WiMax standard. Finally, in order to maximize mobile devices lifetime and guarantee quality of services to consumers, 4G systems will certainly use cooperative MIMO schemes or MIMO relays. Our research activity focuses on MIMO pre-coding and MIMO cooperative communications with the aim of algorithmic optimization and implementation prototyping.

### 4.3. Wireless Sensor Networks

Sensor networks are a very dynamic domain of research due, on the one hand, to the opportunity to develop innovative applications that are linked to a specific environment, and on the other hand to the challenge of designing totally autonomous communicating objects. Cross-layer optimizations lead to energy-efficient architectures and cooperative techniques dedicated to sensor networks applications. In particular, cooperative MIMO techniques are used to decrease the energy consumption of the communications.

### 4.4. Multimedia processing

In multimedia applications, audio and video processing is the major challenge embedded systems have to face. It is computationally intensive with power requirements to meet. Video or image processing at pixel level, like image filtering, edge detection and pixel correlation or at block-level such as transforms, quantization, entropy coding and motion estimation have to be accelerated. We investigate the potential of reconfigurable architectures for the design of efficient and flexible accelerators in the context of multimedia applications.

### 5. New Software and Platforms

#### 5.1. Panorama

With the ever raising complexity of embedded applications and platforms, the need for efficient and customizable compilation flows is stronger than ever. This need of flexibility is even stronger when it comes to research compiler infrastructures that are necessary to gather quantitative evidence of the performance/energy or cost benefits obtained through the use of reconfigurable platforms. From a compiler point of view, the challenges exposed by these complex reconfigurable platforms are quite significant, since they require the compiler to extract and to expose an important amount of coarse and/or fine grain parallelism, to take complex resource constraints into consideration while providing efficient memory hierarchy and power management.

Because they are geared toward industrial use, production compiler infrastructures do not offer the level of flexibility and productivity that is required for compiler and CAD tool prototyping. To address this issue, we have designed an extensible source-to-source compiler infrastructure that takes advantage of leading edge model-driven object-oriented software engineering principles and technologies.
Figure 2 shows the global framework that is being developed in the group. Our compiler flow mixes several types of intermediate representations. The baseline representation is a simple tree-based model enriched with control flow information. This model is mainly used to support our source-to-source flow, and serves as the backbone for the infrastructure. We use the extensibility of the framework to provide more advanced representations along with their corresponding optimizations and code generation plug-ins. For example, for our pattern selection and accuracy estimation tools, we use a data dependence graph model in all basic blocks instead of the tree model. Similarly, to enable polyhedral based program transformations and analysis, we introduced a specific representation for affine control loops that we use to derive a Polyhedral Reduced Dependence Graph (PRDG). Our current flow assumes that the application is specified as a system level hierarchy of communicating tasks, where each task is expressed using C (or Scilab in the short future), and where the system level representation and the target platform model are defined using Domain Specific Languages (DSL).

Gecos (Generic Compiler Suite) is the main backbone of CAIRN’s flow. It is an open source Eclipse-based flexible compiler infrastructure developed for fast prototyping of complex compiler passes. Gecos is a 100% Java based implementation and is based on modern software engineering practices such as Eclipse plugin or model-driven software engineering with EMF (Eclipse Modeling Framework). As of today, our flow offers the following features:

- An automatic floating-point to fixed-point conversion flow (for HLS and embedded processors). ID.Fix is an infrastructure for the automatic transformation of software code aiming at the conversion of floating-point data types into a fixed-point representation. http://idfix.gforge.inria.fr.
- A polyhedral-based loop transformation and parallelization engine (mostly targeted at HLS). http://gecos.gforge.inria.fr. It was used for source-to-source transformations in the context of Nano2012 projects in collaboration with STMicroelectronics.
- A custom instruction extraction flow (for ASIP and dynamically reconfigurable architectures). Durase and UPaK are developed for the compilation and the synthesis targeting reconfigurable
platforms and the automatic synthesis of application specific processor extensions. They use advanced technologies, such as graph matching and graph merging together with constraint programming methods.

- Several back-ends to enable the generation of VHDL for specialized or reconfigurable IPs, and SystemC for simulation purposes (e.g., fixed-point simulations).

5.2. Gecos

Participants: Steven Derrien [corresponding author], Nicolas Simon, Antoine Morvan.

Keywords: source-to-source compiler, model-driven software engineering, retargetable compilation.

The Gecos (Generic Compiler Suite) project is a source-to-source compiler infrastructure developed in the Cairn group since 2004. It was designed to enable fast prototyping of program analysis and transformation for hardware synthesis and retargetable compilation domains.

Gecos is 100% Java based and takes advantage of modern model driven software engineering practices. It uses the Eclipse Modeling Framework (EMF) as an underlying infrastructure and takes benefits of its features to make it easily extensible. Gecos is open-source and is hosted on the Inria gforge at http://gecos.gforge.inria.fr.

The Gecos infrastructure is still under very active development, and serves as a backbone infrastructure to projects of the group. Part of the framework is jointly developed with Colorado State University and since 2012 it is used in the context of the ALMA European project. Recent developments in Gecos have focused on polyhedral loop transformations and efficient SIMD code generation for fixed point arithmetic data-types as a part of the ALMA project. Significant efforts were also put to provide a coarse-grain parallelization engine targeting the data-flow actor model in the context of the COMPA ANR project.

5.3. ID.Fix: Infrastructure for the Design of Fixed-point Systems

Participants: Olivier Sentieys [corresponding author], Romuald Rocher, Nicolas Simon.

Keywords: fixed-point arithmetic, source-to-source code transformation, accuracy optimization, dynamic range evaluation

The different techniques proposed by the team for fixed-point conversion are implemented on the ID.Fix infrastructure. The application is described with a C code using floating-point data types and different pragmas, used to specify parameters (dynamic, input/output word-length, delay operations) for the fixed-point conversion. This tool determines and optimizes the fixed-point specification and then, generates a C code using fixed-point data types (ac_fixed ) from Mentor Graphics. The infrastructure is made-up of two main modules corresponding to the fixed-point conversion (ID.Fix-Conv) and the accuracy evaluation (ID.Fix-Eval).

The different developments carried out in 2014 allowed to have a complete compatibility with GeCos. The structure of each node in the graph has been changed to simplify the graph modifications. The Octave software has been added instead of Matlab for LTI and recursive systems conversion. A development has started to replace Matlab/Octave tool by a C code algorithm to reduce optimization time. In the context of the ANR DEFIS project, the ID.Fix tool has been reorganized to be integrated in the DEFIS toolflow.

In 2014, ID.Fix has been demonstrated during University Booth at IEEE/ACM DATE.

5.4. UPaK: Abstract Unified Pattern-Based Synthesis Kernel for Hardware and Software Systems

Participants: Christophe Wolinski [corresponding author], François Charot.

Keywords: compilation for reconfigurable systems, pattern extraction, constraint-based programming.
We are developing (with strong collaboration of Lund University, Sweden and Queensland University, Australia) UPaK Abstract Unified Pattern Based Synthesis Kernel for Hardware and Software Systems [117]. The preliminary experimental results obtained by the UPak system show that the methods employed in the systems enable a high coverage of application graphs with small quantities of patterns. Moreover, high application execution speed-ups are ensured, both for sequential and parallel application execution with processor extensions implementing the selected patterns. UPaK is one of the basis for our research on compilation and synthesis for reconfigurable platforms. It is based on the HCDG representation of the Polychrony software designed at Inria-Rennes in the project-team Espresso.

5.5. DURASE: Automatic Synthesis of Application-Specific Processor Extensions

Participants: Christophe Wolinski [corresponding author], François Charot.

Keywords: compilation for reconfigurable systems, instruction-set extension, pattern extraction, graph covering, constraint-based programming.

We are developing a framework enabling the automatic synthesis of application specific processor extensions. It uses advanced technologies, such as algorithms for graph matching and graph merging together with constraints programming methods. The framework is organized around several modules.

- CoSaP: Constraint Satisfaction Problem. The goal of CoSaP is to decouple the statement of a constraint satisfaction problem from the solver used to solve it. The CoSaP model is an Eclipse plugin described using EMF to take advantage of the automatic code generation and of various EMF tools.
- HCDG: Hierarchical Conditional Dependency Graph. HCDG is an intermediate representation mixing control and data flow in a single acyclic representation. The control flow is represented as hierarchical guards specifying the execution or the definition conditions of nodes. It can be used in the Gecos compilation framework via a specific pass which translates a CDFG representation into an HCDG.
- Patterns: Flexible tools for identification of computational pattern in a graph and graph covering. These tools model the concept of pattern in a graph and provide generic algorithms for the identification of pattern and the covering of a graph. The following sub-problems are addressed: (sub)-graphs isomorphism, patterns generation under constraints, covering of a graph using a library of patterns. Most of the implemented algorithms use constraints programming and rely on the CoSaP module to solve the optimization problem.

5.6. PowWow: Power Optimized Hardware and Software FrameWork for Wireless Motes (AP-L-10-01)

Participants: Olivier Sentieys [corresponding author], Olivier Berder, Arnaud Carer, Steven Derrien.

Keywords: Wireless Sensor Networks, Low Power, Preamble Sampling MAC Protocol, Hardware and Software Platform

PowWow is an open-source hardware and software platform designed to handle wireless sensor network (WSN) protocols and related applications. Based on an optimized preamble sampling medium access (MAC) protocol, geographical routing and protothread library, PowWow requires a lighter hardware system than Zigbee [79] to be processed (memory usage including application is less than 10kb). Therefore, network lifetime is increased and price per node is significantly decreased.
CAIRN’s hardware platform (see Figure 3) is composed of:

- The motherboard, designed to reduce power consumption of sensor nodes, embeds an MSP430 microcontroller and all needed components to process PowWow protocol except radio chip. JTAG, RS232, and I2C interfaces are available on this board.
- The radio chip daughter board is currently based on a TI CC2420.
- The coprocessing daughter board includes a low-power FPGA which allows for hardware acceleration for some PowWow features and also includes dynamic voltage scaling features to increase power efficiency. The current version of PowWow integrates an Actel IGLOO AGL250 FPGA and a programmable DC-DC converter. We have shown that gains in energy of up to 700 can be obtained by using FPGA acceleration on functions like CRC-32 or error detection with regards to a software implementation on the MSP430.
- Finally, a last daughter board is dedicated to energy harvesting techniques. Based on the energy management component LTC3108 from Linear Technologies, the board can be configured with several types of stored energy (batteries, micro-batteries, super-capacitors) and several types of energy sources (a small solar panel to recover photovoltaic energy, a piezoelectric sensor for mechanical energy and a Peltier thermal energy sensor).

PowWow distribution also includes a generic software architecture using event-driven programming and organized into protocol layers (PHY, MAC, LINK, NET and APP). The software is based on Contiki [95], and more precisely on the Protothread library which provides a sequential control flow without complex state machines or full multi-threading.

To optimize the network regarding a particular application and to define a global strategy to reduce energy, PowWow offers the following extra tools: over-the-air reprogramming (and soon reconfiguration), analytical power estimation based on software profiling and power measurements, a dedicated network analyzer to probe and fix transmissions errors in the network. More information can be found at http://powwow.gforge.inria.fr.

5.7. Ziggie: a Platform for Wireless Body Sensor Networks

Participants: Olivier Sentieys, Olivier Berder, Arnaud Carer, Antoine Courtay [corresponding author], Robin Bonamy.
Keywords: Wireless Body Sensor Networks, Low Power, Gesture Recognition, Localization, Hardware and Software Platform

The Zygie sensor node has been developed in the team to create an autonomous Wireless Body Sensor Network (WBSN) with the capabilities of monitoring body movements. The Zygie platform is part of the BoWI project funded by CominLabs. Zygie is composed of: an ATMEGA128RFA1 microcontroller, an MPU9150 Inertial Measurement Unit (IMU), an RF AS193 switch with two antennas, an LSP331AP barometer, a DC/DC voltage regulator with a battery charge controller, a wireless inductive battery charge controller, and some switches and control LEDs.

![Zygie platform for WBSN](image.jpg)

**Figure 4. CAIRN’s Ziggie platform for WBSN**

The IMU is composed of a 3-axis accelerometer, a 3-axis gyrometer and a 3-axis magnetometer. The IMU is communicating its data to the embedded microcontroller via an I2C protocol. We also developed our own MAC protocol for synchronization and data exchanges between nodes. The Zygie platform is used in many PhD works for evaluating data fusion algorithms (RSSI + IMU data) (Zhongwei Zheng, UR1 and Alexis Aulery, UBS/UR1), low power computing algorithms (Alexis Aulery, UBS/UR1), wireless protocols (Viet Hoa Nguyen, UR1) and body channel characterization (Rizwan Masood, TB).

6. New Results

6.1. Highlights of the Year

Our work on accuracy evaluation and optimisation for fixed point arithmetic was presented during a tutorial “Automatic Fixed-Point Conversion: a Gateway to High-Level Power Optimization” at IEEE/ACM Design Automation and Test in Europe [77].

As a proof of concept of our research on improving efficiency of dynamic reconfiguration in FPGAs [47] [48], the eFPGA (Figure 5) chip was designed and fabricated in 65nm CMOS technology. In the proposed and patented architecture [73] (EU patent), the configuration of the FPGA becomes independent from its placement and is moreover significantly compressed (up to \(\times 10\)). This notion of Virtual Bit Stream allows for seamless partial and dynamic reconfiguration and for task migration.

6.2. Reconfigurable Architecture Design

6.2.1. Dynamic reconfiguration support in FPGA

Participants: Olivier Sentieys, Antoine Courtay, Christophe Huriaux.
Almost since the creation of the first SRAM-based FPGAs there has been a desire to explore the benefits of partially reconfiguring a portion of an FPGA at run-time while the remainder of design functionality continues to operate uninterrupted. Currently, the use of partial reconfiguration imposes significant limitations on the FPGA design: reconfiguration regions must be constrained to certain shapes and sizes and, in many cases, bitstreams must be precompiled before application execution depending on the precise region of the placement in the fabric. We plan to develop an FPGA architecture that allows for seamless translation of partially-reconfigurable regions, even if the relative placement of fixed-function blocks within the region is changed.

**FPGA Architecture Support for Heterogeneous, Relocatable Partial Bitstreams.**

The use of partial dynamic reconfiguration in FPGA-based systems has grown in recent years as the spectrum of applications which use this feature has increased. For these systems, it is desirable to create a series of partial bitstreams which represent tasks which can be located in multiple regions in the FPGA fabric. While the transferal of homogeneous collections of lookup-table based logic blocks from region to region has been shown to be relatively straightforward, it is more difficult to transfer partial bitstreams which contain fixed-function resources, such as block RAMs and DSP blocks. In this work we consider FPGA architecture enhancements which allow for the migration of partial bitstreams including fixed-function resources from region to region even if these resources are not located in the same position in each region. Our approach does not require significant, time-consuming place-and-route during the migration process. We quantify the cost of inserting additional routing resources into the FPGA architecture to allow for easy migration of heterogeneous, fixed-function resources. Our experiments show that this flexibility can be added for a relatively low overhead and performance penalty. This work was performed during Christophe Huriaux’s visit at UMASS in summer 2014 in the context of Inria Associate Team Hardiesse and has been published in [48] and in [74] as a poster.

**Virtual Bit Streams: Design Flow and Run-Time Management of Compressed and Relocatable FPGA Configurations.**

The aim of partially and dynamically reconfigurable hardware is to provide an increased flexibility through the load of multiple applications on the same reconfigurable fabric at the same time. However, a configuration bit-stream loaded at runtime should be created offline for each task of the application. Moreover, modern applications use a lot of specialized hardware blocks to perform complex operations, which tends to cancel the "single bit-stream for a single application" paradigm, as the logic content for different locations of the reconfigurable fabric may be different. We proposed a design flow for generating compressed configuration bit-streams abstracted from their final position on the logic fabric. Those configurations can then be decoded and finalized in real-time and at run-time by a dedicated reconfiguration controller to be placed at a given physical location. The VTR framework has been expanded to include bit-stream generation features. A bit-stream format is proposed to take part of our approach and the associated decoding architecture was designed. We analyzed the compression induced by our coding method and proved that compression ratios of at least...
2.5× can be achieved on the 20 largest MCNC benchmarks. The introduction of clustering which aggregates multiple routing resources together showed compression ratio up to a factor of 10×, at the cost of a more complex decoding step at runtime. Future perspectives on the VBS include extension of the architecture to support commercially available FPGAs as well as the improvement of the associated CAD tool flow to include smarter coding of the VBS to gain in runtime efficiency and in size. The VBS approach can provide increased online relocation capabilities using a decoding algorithm capable of decoding the VBS on-the-fly during the task migration. We applied for a European Patent on this work [73] and the results will be published in 2015 at IEEE/ACM DATE [47].

6.2.2. Power Models of Reconfigurable Architectures

Participants: Robin Bonamy, Daniel Chillet, Olivier Sentieys.

Including a reconfigurable area in complex systems-on-chip is considered as an interesting solution to reduce the area of the global system and to support high performance. But the key challenge in the context of embedded systems is currently the power budget and the designer needs some early estimations of the power consumption of its system. Power estimation for reconfigurable systems is a difficult issue since several parameters need to be taken into account to define an accurate model. In this research, we consider the opportunity of the dynamic reconfiguration for the reduction of power consumption by the management of tasks scheduling and placement. We analyzed the power consumption during the dynamic reconfiguration on a Virtex 5 board. Three models of the partial and dynamic reconfiguration power consumption with different complexity/accuracy tradeoffs are defined. These models are used in design space exploration to evaluate the impact of reconfiguration on energy consumption of a complete system. We propose a methodology for power/energy consumption modeling and estimation in the context of heterogeneous (multi)processor(s) and dynamically reconfigurable hardware systems. We developed an algorithm to explore all task mapping possibilities for a complete application (e.g., for H264 video coding) with the aim to extract one of the best solutions with respect to the designer’s requirements. This algorithm is a step ahead for defining on-line power management strategies to decide which task instances must be executed to efficiently manage the available power using dynamic partial reconfiguration [24].

6.2.3. Real-time Spatio-Temporal Task Scheduling on 3D Architecture

Participants: Quang-Hai Khuat, Quang Hoa Le, Emmanuel Casseau, Antoine Courtay, Daniel Chillet.

One of the main advantages offered by a three-dimensional system-on-chip (3D SoC) is the reduction of wire length between different blocks of a system, thus improving circuit performance and alleviating power overheads of on-chip wiring. To fully exploit this advantage, an efficient management referring to allocate temporarily the tasks at different levels of the architecture is greatly important. In the context of 3D SoC, we have developed several spatio-temporal scheduling algorithms for 3D MultiProcessor Reconfigurable System-on-Chip (3DMPRSoC) architectures composed of a multiprocessor layer and an embedded Field Programmable Gate Array (eFPGA) layer with dynamic reconfiguration. These two layers are interconnected vertically by through-silicon vias (TSVs) ensuring tight coupling between software tasks on processors and associated hardware accelerators on the eFPGA. Our algorithms cope with task dependencies and try to allocate communicating tasks close to each other in order to reduce direct communication cost, thus reducing global communication cost. In the 3DMPRSoC context, our algorithms favor direct communications including: i) point-to-point communication between hardware accelerators on the eFPGA, ii) communication between software tasks through the Network-on-Chip of the multiprocessor layer, and iii) communication between software task and accelerator through TSV. When a direct communication between two tasks occurs, the data are stored in a shared memory placed onto the multiprocessor layer.

The algorithm proposed in [50] considers heterogeneous reconfigurable architecture and proposes a mathematical formulation for spatio-temporal scheduling of a task graph. The placement consists in finding the best mapping of the application task model onto the reconfigurable region. To improve the performance of our algorithm, we propose to configure the tasks by taking account of their priority. The global objective consists in the reduction of the global execution time. The second algorithm presented in [51] improves the previous one and proposes to exploit the presence of processor in the multiprocessor layer in order to anticipate a software
execution of a task when no sufficient area is available. In this case, classical algorithms reject the task, and continue their execution. Our algorithm starts a software execution of the task, but the software execution is a speculative execution. Indeed, if a sufficient area is freed by a hardware task later, in this case our algorithm evaluates if the software execution must continue or if it is better to stop this execution to restart the task in the reconfigurable area. We demonstrated that the execution time of an application can be significantly reduced by applying this software speculation.

In [53], we proposed a heuristic which focus on the online task placement problem on a multi-context, dynamically and partially reconfigurable heterogeneous architecture. Configuration prefetching and anti-fragmentation well known techniques are combined with the place reservation technique that takes into account tasks to be placed in the future (pre-allocated tasks) while fulfilling task execution deadline constraint. Compared to a placement without reservation, our approach improves the number of placed tasks and the resource utilization rate.

### 6.2.4. Run-time Task Management to Increase Resource Utilisation for Concurrent Critical Tasks in Mixed-Critical Systems

**Participant:** Angeliki Kritikakou.

When integrating mixed critical systems on a multi/many-core system, one challenge is to ensure predictability for the high criticality tasks and an increased utilization for low criticality tasks. In [52], we proposed a distributed run-time WCET controller to address this problem, when several high criticality tasks with different deadlines, periods and offsets are concurrently executed on a multi core system.

During the system execution, the proposed controller regularly checks locally at each critical task if the interferences due to the low criticality tasks can be tolerated. This is achieved by monitoring the ongoing execution time, dynamically computing the remaining worst case execution time of the critical task when only critical tasks are executed on the system and checking our safety condition. In case that the condition is violated for one critical task, the concurrent execution of the low criticality tasks with the critical one will lead to its deadline miss. Therefore, the local controller decides the suspension of the less critical tasks. However, the local controller is not responsible for the actual suspension of the low criticality tasks. The controller sends a request to a master which has a global view of the system. The master is in charge of collecting the requests of the critical tasks, suspending and restarting the low criticality tasks. When at least one critical task sends the request for suspension of the low criticality tasks, the master suspends them. During execution, the master updates the number of active requests and it restarts the low criticality tasks when all requesters have finished their execution. We have implemented our approach as a software controller on a real multi-core COTS system, the TMS320C6678 chip of Texas Instruments, where we have observed significant gains up to 556\% for our case study.

### 6.2.5. Arithmetic Operators for Cryptography and Fault-Tolerance

**Participants:** Arnaud Tisserand, Emmanuel Casseau, Nicolas Veyrat-Charvillon, Karim Bigou, Franck Bucheron, Jérémie Métairie, Gabriel Gallin, Huu Van Long Nguyen, Nicolas Estibals.

**Arithmetic Operators for Fast and Secure Cryptography.**

In the paper [39] presented at ASAP, we describe a new RNS (residue number system) modular multiplication algorithm, for finite field arithmetic over GF(p), based on a reduced number of moduli in base extensions with only \(3n/2\) moduli instead of \(2n\) for standard ones. Our algorithm reduces both the number of elementary modular multiplications (EMMs) and the number of stored precomputations for large asymmetric cryptographic applications such as elliptic curve cryptography or Diffie-Hellman (DH) cryptosystem. It leads to faster operations and smaller circuits.

The PhD thesis defended by Karim Bigou [16] deals with the RNS representation and the associated arithmetic algorithms for asymmetric cryptography (ECC and RSA). The title of the PhD is "Theoretical Study and Hardware Implementation of Arithmetical Units in Residue Number System (RNS) for Elliptic Curve Cryptography".
Scalar recoding is popular to speed up ECC (elliptic curve cryptography) scalar multiplication: non-adjacent form, double-base number system, multi-base number system (MBNS). Ensuring uniform computation profiles is an efficient protection against some side channel attacks (SCA) in embedded systems. Typical ECC scalar multiplication methods use two point operations (addition and doubling) scheduled according to secret scalar digits. Euclidean addition chains (EAC) offer a natural SCA protection since only one point operation is used. Computing short EACs is considered as a very costly operation and no hardware implementation has been reported yet. We designed an hardware recoding unit for short EACs which works concurrently to scalar multiplication. It has been integrated in an in-house ECC processor on various FPGAs. The implementation results show similar computation times compared to non-protected solutions, and faster ones compared to typical protected solutions (e.g. 18\% speed-up over 192b Montgomery ladder).

In the paper [40], we introduce a robust asynchronous logic family which does not rely on timing assumptions and/or delay elements and can operate with sub-powered devices. The key element behind our proposal is a simplified completion detection mechanism which makes it substantially more energy effective when compared with other dual-rail approaches. A 32-bit Ripple Carry Adder (RCA) is implemented in 65nm and 45nm CMOS process to evaluate the practicability of our approach. Firstly, the Optimal Energy Point (OEP) of the proposed RCA is investigated by scaling VDD from 0.4V to 0.2V (50mV interval), where the OEP occurs at 0.25V for both technologies. Secondly, while comparing the energy consumption with the corresponding single-rail benchmark at its OEP in 65nm process, 30\% (34 fJ for 65nm) and 40\% (54fJ for 45nm after scaling) energy savings are achieved respectively. More impressive (10x better) energy efficiency and reasonable performance are obtained over dual-rail counterparts. This work is done in the SPINaCH project.

ECC Crypto-Processor with Protections Against SCA.

A dedicated processor for elliptic curve cryptography (ECC) is under development. Functional units for arithmetic operations in GF($2^m$) and GF($p$) finite fields and 160-600-bit operands have been developed for FPGA implementation. Several protection methods against side channel attacks (SCA) have been studied. The use of some number systems, especially very redundant ones, allows one to change the way some computations are performed and then their effects on side channel traces. This work is done in the PAVOIS project.

Arithmetic Operators and Crypto-Processor for HECC.

In the HAH project, we study and prototype efficient arithmetic algorithms for hyperelliptic curve cryptography for hardware implementations (on FPGA circuits). We study new advanced arithmetic algorithms and representations of numbers for efficient and secure implementations of HECC in hardware.

Arithmetic Operators for Fault Tolerance.

In the ARDyT and Reliasic projects, we work on computation algorithms, representations of numbers and hardware implementations of arithmetic operators with integrated fault detection (and/or fault tolerance) capabilities. The target arithmetic operators are: adders, subtracters, multipliers (and variants of multiplications by constants, square, FMA, MAC), division, square-root, approximations of the elementary functions. We study two approaches: residue codes and specific bit-level coding in some redundant number systems for fault detection/tolerance integration at the arithmetic operator/unit level. FPGA prototypes are under development.

Secure Virtualization in Hardware

In the paper [70] presented at SDTA, we deal with secure solutions that can help virtualization and communication which can be implemented on new hybrids (Core + FPGA) development platforms. On one side, these boards are featured with processors that do not have virtualization extensions but are powerful enough to really support hypervisors and their guests. On the other side some virtualization solutions presently exist for ARM processors but they only refer to TrustZone for their (hardware) security. These hybrid boards can offer us more: we have read some recents and up-to-date specifications made by a consortium to help the implementation of hardware security. In this area, FPGA can help in securing virtualization. But we must notice that, for now, all has been made for Intel/AMD architectures and for a lone operating system. Even so, the whole propositions are too complex to be implemented on embedded systems. So, we will have to use some
capabilities in hardware development and make software rearrangements to help us to design a functional solution.

6.3. Compilation and Synthesis for Reconfigurable Platform

6.3.1. Numerical Accuracy Analysis and Optimization

Participants: Olivier Sentieys, Steven Derrien, Romuald Rocher, Pascal Scalart, Tomofumi Yuki, Aymen Chakhari, Gaël Deest.

The problem of accuracy evaluation is one of the most time consuming tasks during the fixed-point refinement process. Analytical techniques based on perturbation theory have been proposed in order to overcome the need for long fixed-point simulation. However, these techniques are not applicable in the presence of certain operations classified as un-smooth operations. In such circumstances, fixed-point simulation should be used. In [33], an algorithm detailing the hybrid technique which makes use of an analytical accuracy evaluation technique used to accelerate fixed-point simulation was proposed. This technique is applicable to signal processing systems with both feed-forward and feedback interconnect topology between its operations. The proposed algorithm makes use of the classification of operators as smooth or un-smooth and uses the analytical SNS model obtained by using our previously published analytical techniques to evaluate the impact of finite precision on smooth operators, while performing simulation of the un-smooth operators during fixed-point simulation. In other words, parts of the system are selectively simulated only when un-smooth errors occur and not otherwise. Thus, the effort for fixed-point simulation is greatly reduced. The acceleration obtained as a result of applications of the proposed technique is consistent with fixed-point simulation, while reducing the time taken for fixed-point simulation by several orders of magnitude. The preprocessing overhead consists of deriving the single-noise-source model, and it is often small in comparison to the time required for fixed-point simulation. The advantage of using the proposed technique is that the user need not spend time on characterizing the nonlinearities associated with un-smooth operations. Several examples from general signal processing, communication, and image processing domains are considered for evaluation of the proposed hybrid technique. The acceleration obtained is quantified as an improvement factor. Very high improvement factors indicate that the hybrid simulation is several orders of magnitude faster than classical fixed-point simulation.

One of the limitation of analytical accuracy technique is that they are based on a Signal Flow Graph Representation of the system to be analyzed. This SFG model is currently built-out of a source program by flattening its whole control-flow (including full loop unrolling) which raises significant accuracy analysis issues. To overcome these limitations, we have proposed [41] to adapt state of the art accuracy analysis techniques to take advantage of compact polyhedral program representations. Combining the two approaches provide a more general and scalable framework which significantly extends the applicability of accuracy models, enabling the analysis of complex image processing kernels operating on multidimensional data-sets.

An analytical approach was studied to determine accuracy of systems including unsmooth operators. An unsmooth operator represents a function which is not derivable in all its definition interval (for example the sign operator). The classical model is no longer valid since these operators introduce errors that do not respect the Widrow assumption (their values are often higher than signal power). So an approach based on the distribution of the signal and the noise was proposed. We focused on recursive structures where an error influences future decision (such as Decision Feedback Equalizer). In that case, numerical analysis method (e.g., Newton Raphson algorithm) can be used. Moreover, an upper bound of the error probability can be analytically determined. We also studied the case of Turbo Coder and Decoder to determine data word-length ensuring sufficient system quality [17].

6.3.2. Reconfigurable Processor Extension Generation

Participants: Christophe Wolinski, François Charot.
Most proposed techniques for automatic instruction sets extension usually dissociate pattern selection and
instruction scheduling steps. The effects of the selection on the scheduling subsequently produced by the
compiler must be predicted. This approach is suitable for specialized instructions having a one-cycle duration
because the prediction will be correct in this case. However, for multi-cycle instructions, a selection that
does not take scheduling into account is likely to privilege instructions which will be, a posteriori, less
interesting than others in particular in the case where they can be executed in parallel with the processor
core. The originality of our research work is to carry out specialized instructions selection and scheduling
in a single optimization step. This complex problem is modeled and solved using constraint programming
techniques. This approach allows the features of the extensible processor to be taken into account with
a high degree of flexibility. Different architectures models can be envisioned. This can be an extensible
processor tightly coupled to a hardware extension having a minimal number of internal registers used to store
intermediate results, or a VLIW-oriented extension made up of several processing units working in parallel
and controlled by a specialized instruction. These techniques have been implemented in the Gecos source-to-
source framework.

Novel techniques addressing the interactions between code transformation (especially loops) and instruction
set extension are under study. The idea is to automatically transform the original loop nests of a program
(using the polyhedral model) to select specialized and vector instructions. These new instructions may use
local memories located in the hardware extension and used to store intermediates data produced at a given
loop iteration. Such transformations lead to patterns whose effect is to significantly reduce the pressure on the
memory of the processor.

We also studied a way to identify custom instructions at the application domain level instead of addressing
it on a per-application basis. Domain-specific instruction set extension aims at maximizing the usage of a
custom instruction across a set of applications belonging to an application domain. The idea is to guarantee
that each custom instruction has a high degree of utilization across many applications of a given domain,
while still delivering the required performance improvement. The instruction identification problem is here
formulated as the maximum common subgraph problem and it is solved by transforming it into a maximum
clique problem.

6.3.3. Optimization of Loop Kernels Using Software and Memory Information

Participant: Angeliki Kritikakou.

The compilers optimize the compilation sub-problems one after the other following an order which leads to less
efficient solutions because the different sub-problems are independently optimized taking into account only
a part of the information available in the algorithms and the architecture. In a paper accepted for publication
in Computer Languages, Systems & Structures (COMLAN), Elsevier, we have presented an approach which
applies loop transformations in order to increase the performance of loop kernels. The proposed approach
focuses on reducing the L1, L2 data cache and main memory accesses and the addressing instructions. Our
approach exploits the software information, such as the array subscript equations, and the memory architecture,
such as the memory sizes. Then, it applies source-to-source transformations taking as input the C code of the
loop kernels and producing a new C code which is compiled by the target compiler. We have applied our
approach to five well-known loop kernels for both embedded processors and general purpose processors.
From the obtained experimental results we observed speedup gains from 2 up to 18.

6.3.4. Design Tools for Reconfigurable Video Coding

Participants: Emmanuel Casseau, Yaset Oliva Venegas.

In the field of multimedia coding, standardization recommendations are always evolving. To reduce design
time taking benefit of available SW and HW designs, Reconfigurable Video Coding (RVC) standard allows
defining new codec algorithms. The application is represented by a network of interconnected components
(so called actors) defined in a modular library and the behaviour of each actor is described in the specific
RVC-CAL language. Dataflow programming, such as RVC applications, express explicit parallelism within
an application. However general purpose processors cannot cope with both high performance and low power
consumption requirements embedded systems have to face. We have investigated the mapping of RVC applications onto a dedicated multiprocessor platform. Actually, our goal is to propose an automated co-design flow based on the RVC framework. The designer provides the application description in the RVC-CAL language, after which the co-design flow automatically generates a network of processors that can be synthesized on FPGA platforms. Two kinds of platforms can be targeted. The first platform is made of processors based on a low complexity and configurable TTA processor (Very Long Instruction Word-style processor). The architecture model of the platform is composed of processors with their local memories, an interconnection network and shared memories. Both shared and local memories are used to limit the traditional memory bottleneck. Processors are connected together through the shared memories [72] [69] [36]. The second platform more specifically targets the Zynq platform from Xilinx. The processors are MicroBlaze processors. Their local memory is dedicated to instruction code only. A common shared memory is used for the data exchanges between the processors (to store the data that communicate between actors). At present time, the actor mapping is chosen at compile time but we expect dynamic mapping soon. The mapping will be computed at runtime on the ARM processor. The actor’s code will be stored in the DDR memory so that it can be easily transferred to the MicroBlaze instruction cache depending on the actor mapping [55] [76]. This work is done in collaboration with IETR and has been implemented in the Orcc open-source compiler (Open RVC-CAL Compiler: http://orcc.sourceforge.net).

6.3.5. A Domain Specific Language for Rapid Prototyping of Software Radio Waveforms

Participants: Matthieu Gautier, Olivier Sentieys, Ganda-Stéphane Ouedraogo.

Software Defined Radio (SDR) is now becoming a ubiquitous concept to describe and implement Physical Layers (PHYs) of wireless systems. Moreover, even though the FPGA (Field Programmable Gate Array) technology is expected to play a key role in SDR, describing a PHY at the Register-Transfer-Level (RTL) requires tremendous efforts. We introduced a novel methodology to rapidly implement PHYs for FPGA-SDR platforms. The work relies upon High-Level Synthesis tools and dataflow modeling to infer an efficient system-level control unit for the application. The proposed software-based over-layer partly handles the complexity of programming an FPGA and integrates reconfigurable features. It consists essentially of a Domain-Specific Language (DSL) [60] that handles the complexity of programming an FPGA and a DSL-Compiler [32] for automation purpose. IEEE 802.11a a and IEEE 802.15.4 transceivers have been designed and explored [45] via this new methodology in order to show the rapid prototyping feature.

6.4. Interaction between Algorithms and Architectures

6.4.1. Cooperative-cum-Constrained Maximum Likelihood Algorithm for UWB-based Localization in Wireless BANs

Participants: Antoine Courtay, Matthieu Gautier, Gia Minh Hoang [Master’s Student].

Wireless Body Area Network (BAN) is a mainstream technology for numerous application fields (medicine, security, sport science, etc.) and precise determination of wireless sensors’ positions responses to the great needs in many applications. This study leverages Ultra Wide Band (UWB) radio which is an attractive technology to achieve the centimeter-level distance measurements. However, the aggregation of the distance information remains a challenge to achieve an accurate localization in wireless BAN. To this aim, we have proposed a novel Cooperative-cum-Constrained Maximum Likelihood (CCML) localization algorithm. This algorithmic study shows the improvement that could be achieved by combining UWB radio and dedicated algorithms. Future works is to integrate UWB technology in the second version of the Zyggie platform developed in CAIRN.

6.4.2. MIMO Systems and Cooperative Strategies for Low-Energy Wireless Networks

Participants: Olivier Berder, Olivier Sentieys, Baptiste Vrigneau, Viet-Hoa Nguyen.
Since a couple of years, the CAIRN team has reached a significant expertise in multi-antenna systems, especially in linear precoding. If this technique is traditionally used in a collocated way, it could also be used for wireless sensor networks (WSN) in a distributed manner. We presented a new approach, named distributed max-dmin precoding (DMP). This protocol is based on the deployment of a virtual 2x2 max-dmin precoding over one source, one forwarding relay, both equipped with one antenna and a destination involving two antennas. In this context, two kinds of relaying, amplify and forward or decode and forward protocols, were investigated. The performance evaluation in terms of Bit-Error-Rate (BER) and energy efficiency was compared with non cooperative techniques (SISO, SIMO) and the distributed space time block code (STBC) scheme. Our investigations showed that the DMP takes the advantage in terms of energy efficiency from medium transmission distances.

A receiver initiated cooperative medium access control (RIC-MAC) protocol was also proposed for cooperative communications to reduce the energy consumption of WSN. Considering a real WSN platform, the simulation results show that using the proposed RIC-MAC protocol in cooperative communications provides latency and energy gains as compared to multi-hop communications. Even if the energy gain is shown to be reduced when the network traffic load increases, our protocol still brings an energy gain about 22% at 1 packet/second. Finally, considering the impact of traffic load on energy consumption and latency, RIC-MAC is illustrated to be robust to traffic load variations in terms of latency [66].

6.4.3. Adaptive protocols for Wireless Sensor Networks

Participants: Olivier Berder, Matthieu Gautier, Nhat-Quang Nhan [Master’s Student], Van-Thiep Nguyen.

As tiny sensor nodes are equipped with limited battery, the optimization of the power consumption of these devices is extremely vital. In typical WSN platforms, the radio transceiver consumes major proportion of the energy. Major concerns are therefore to decrease the radio activity by designing efficient MAC protocols.

Energy consumption plays an important role in the design of Wireless Body Area Sensor Network (WBASN). Unfortunately, the performance of WBASN decreases in high interference environments such as the Industrial, Scientific and Medical (ISM) band where wireless spectrums are getting crowded. In this study [59], an energy-efficient Medium Access Control (MAC) protocol named C-RICER (Cognitive-Receiver Initiated CyclEd Receiver) is specifically designed for WBASN to cognitively work in high interference environment. C-RICER protocol adapts both transmission power and channel frequency to reduce the interferences and thus, the energy consumption. The protocol is simulated with the OMNET++ simulator. Simulation results show that, depending on the interference level, C-RICER is able to outperform the traditional RICER protocol in terms of energy consumption, packet delay, and network throughput.

In recent years, many MAC protocols for Wireless Sensor Networks (WSNs) have been proposed and evaluated using Matlab simulator and/or network simulators (OMNeT++, NS2, etc.). However, most of them have a static behavior and few network simulations are available for adaptive protocols. Specially, in OMNeT++/MiXiM, there is few energy-efficient MAC protocol for WSNs (B-MAC and L-MAC) and no adaptive protocol. To this end, the TAD-MAC (Traffic Aware Dynamic MAC) protocol has been simulated in OMNeT++ with the MiXiM framework [57]. The simulation results have been used to compare with B-MAC and L-MAC protocol, showing the gain brought by TAD-MAC.

6.4.4. Energy Harvesting and Power Management

Participants: Olivier Berder, Olivier Sentieys, Arnaud Carer, Trong-Nhan Le.

To design autonomous Wireless Sensor Networks (WSNs) with a theoretical infinite lifetime, energy harvesting (EH) techniques have been recently considered as promising approaches. Ambient sources can provide everlasting additional energy for WSN nodes and exclude their dependence on battery. An efficient energy harvesting system which is compatible with various environmental sources such as light, heat or wind energy was proposed. Our platform takes advantage of double-level capacitors not only to prolong the system lifetime but also to enable robust booting from the exhausting energy of the system. Simulations and experiments showed that it can achieve booting time in order of seconds. Although capacitors have virtual recharge cycles,
they suffer from higher leakage compared to rechargeable batteries. Increasing their size can decrease the system performance due to leakage energy. Therefore, an energy neutral design framework providing a methodology to determine the minimum size of the storage devices satisfying Energy Neutral Operation (ENO) and maximizing system Quality of Service (QoS) in EH nodes when using a given energy source was proposed. Experiments validating this framework were performed on a real WSN platform with both photovoltaic cells and thermal generators in an indoor environment [30].

A new PM for EH-WSNs scavenging energy from periodic sources, i.e., ambient energy is not available during the full harvesting cycle, was proposed. Not only respecting the ENO condition, our PM is able to balance the Quality of Service (QoS) during the whole cycle to provide regular data tracking, which is essential for WSN applications like monitoring. Simulations on OMNET++ show that our PM can improve the QoS during the absence of energy by a factor up to 84% compared to state-of-the-art PMs, while guaranteeing the same global QoS [54].

6.4.5. Multimedia Processing
Participant: Pascal Scalart.

Most noise reduction methods for multimedia signals are usually based on the application of a short-time Wiener filter (MMSE) that is generally expressed as a spectral gain depending on the local signal-to-noise ratio (SNR) on each frequency bin. To estimate such filter, several algorithms can be found in the literature but these conventional approaches lead to a biased estimator for the a priori signal-to-noise estimate. To reduce this bias, we have proposed in [26] a new strategy that relies on the introduction of a correction term in the computation of the Wiener filter depending on the current state of both the available a priori and a posteriori SNR estimates. The proposed solution leads to a bias-compensated a priori SNR estimate, and allows to finely estimating the target signal that is very close to the original noise-free reference. Such refinement procedure has been tested under various noisy environments and show the superiority of the proposed strategy compared to competitive algorithms.

Audio classification systems have recently gained interest for the design of various real-world multimedia services such as audio database indexing with musical genre classification, video indexing using the soundtrack or context awareness. A large majority of audio classification systems can be viewed as offline applications in the sense that there is no strong restriction about how the signal to be classified is accessed. In [44], we investigate the case where the classification task is performed in real-time in a low-latency classification framework. We proposed different methodologies for the use of feature integration that are based on three key aspects: the selection of the features which have to be temporally integrated, the choice of the integration techniques, i.e. how the temporal information is extracted, and the size of the integration window. The experiments carried out for the classification task show that these different methodologies have a significant impact on the global performance even with the low-latency constraints. In addition, we investigate the detection of howlings that arise in audio signals in [43]. To do so, the processing algorithm is based on a Support Vector Machine (SVM) model in the decision stage and on the combination of energy-based features and also a new feature related to the frequency stability of a howling component. The proposed method can be used in different situation since its provides good results with a very low false alarm rate for a wide range of experimental conditions.

6.4.6. Non-Intrusive Load Monitoring
Participants: Olivier Senteixs, Baptiste Vrigneau, Xuan Chien Le.

Natural resource preservation has recently become a significant concern and has therefore motivated many research and development efforts for energy consumption management in buildings and homes. Efficiently reducing energy consumption at home, work or in a factory, could be afforded by mixing different technologies to not only reduce the energy consumed by consumers, but also to adapt (manage) the energy consumed to the energy that is produced. SMART 2020 outlined the opportunity to capture savings of both energy and Greenhouse Gas (GHG) emissions in 2020, through a range of actions developed by the Information and Communications Technologies (ICT) sector. Smart Grid, Smart Buildings, and Green ICT have the main impact on energy savings. At the energy production side, the electrical grid infrastructure is comprised of
three elements: power generation, transmission, and distribution. Electrical power generation consists mainly of the power plants but also includes more and more renewable sources such as wind power or solar panels on energy farms or locally on top of buildings. The cost of energy storage is very high, and hence the current practice is to match energy consumption closely with energy generation, which is more and more fluctuating: challenges could be seen as being able to use energy when the wind blows or the sun shines, and also to avoid the strong power consumption peaks due to people’s life. A typical example at home could be to automatically use the dryer when energy is available and therefore cheap, and is now well defined as Smart Grid technologies. At the energy consumption side, the main objective is of course to reduce energy consumption of the different subsystems. Interior lighting, office equipment, heating, cooling, and ventilation make up of more than 85% of the total electricity use and the reduction effort should therefore be concentrated on these systems. For energy management and reduction in homes or building a key enabler is the use of wireless sensor networks to monitor the environment (temperature, activity of people, power consumption of equipment, light, etc.) and to act on subsystems (decrease room temperature, stop or start an equipment, adjust cooling or ventilation, etc.). This is the emerging field of Smart Building Automation.

The objective of this work is strongly linked to the usage of these WSN nodes in the context of smart monitoring of energy consumption and environment (temperature, activity, light). We will propose new Indirect Power Monitoring techniques which enable to estimate energy consumed in a building or in a home without effectively measuring the power consumed. A typical AC smart meter is costly equipment and we therefore want to propose cheap and non-invasive sensor nodes. As an example, to estimate the power consumed by the TV, it is not necessary to measure precisely the current it consumed, but a simple sensor able to recognize that TV is on or off can do the same job with a far less complexity. Another example is the development and deployment of room occupancy and people activity sensors that can lead to significant reduction of the energy by regulating HVAC (Heating, Ventilation and Air-Conditioning) or by switching lights and office equipment. The wireless transmission is the main reason of consuming energy and the new algorithms will propose to make the sensors to cooperate inside a low-distance cluster (an office for example). The algorithms will decide the best strategy and the best information to send back in order to offer the best trade-off between Performance/Complexity/Consumption. This work is closely links to power management techniques and energy harvesting (in-door light, heat, vibration). A power manager embedded in energy harvesting WSN nodes adapts the power consumption and computation loads according to the harvested energy to obtain a theoretically infinite lifetime. The main advantage of using energy harvesting (EH) in the context of building and home monitoring is to avoid battery replacement and therefore to reduce installation and maintenance costs of the system.

7. Bilateral Contracts and Grants with Industry

7.1. Bilateral Contracts with Industry

Automatic Analysis, Classification and Processing of Audio Signals, Contract with Orange Labs.

8. Partnerships and Cooperations

8.1. Regional Initiatives

8.1.1. Images & Réseaux Competitivity Cluster - Embrace (2014-2016)

Participants: Raphaël Bardoux, Arnaud Carer, Matthieu Gautier, Olivier Sentieys.

Embrace (Embedded Radio Accelerator) is a project which involves CAIRN and two Small Medium Enterprises (SMEs): Digidia and PrimeGPS. Embrace aims at developing a software radio platform to enable the digital demodulation of HF signals. Both SMEs will use this platform as the first step to implement new products. These products will be dedicated to two different applications (Global Navigation Satellite System and Navigation Safety) at the heart of the markets of the SMEs. CAIRN’s goal is the technological transfer of the methods proposed by the team that enable the rapid prototyping of digital radios.
8.2. National Initiatives

The CAIRN team mainly collaborates with the following laboratories: CEA List, CEA Leti, LEAT Nice, Lab-Stic (Lorient, Brest), LIRMM (Montpellier, Perpignan), LIP6 Paris, IETR Rennes, DTIM-ONERA Toulouse, LAAS Toulouse, IRIT Toulouse, Inria Socrate.

The team participates in the activities of the following research organization of CNRS (GdR for in French "Groupe de Recherche"):

- GdR SOC-SIP (System On Chip & System In Package), working groups on reconfigurable architectures, embedded software for SoC, low power issues. E. Casseau is in charge of the architecture topic of the reconfigurable platform working group.
- GdR ISIS (Information Signal ImageS), working group on Algorithms Architectures Adequation.
- GdR ARITH (Architectures Systèmes et Réseaux)
- GdR IM (Informatique Mathématiques), C2 working group on Codes and Cryptography and ARITH working group on Computer Arithmetic

8.2.1. ANR Blanc - PAVOIS (2012–2016)

Participants: Arnaud Tisserand, Emmanuel Casseau, Philippe Quémerais, Jérémie Métairie, Nicolas Veyrat-Charvillon, Karim Bigou.

PAVOIS (in French: Protections Arithmétiques Vis à vis des attaques physiques pour la cryptOgraphie basée sur les courbes elliptiques) is a project on Arithmetic Protections Against Physical Attacks for Elliptic Curve based Cryptography. It involves IRISA-CAIRN (Lannion) and LIRMM (Perpignan and Montpellier). This project will provide novel implementations of curve based cryptographic algorithms on custom hardware platforms. A specific focus will be placed on trade-offs between efficiency and robustness against physical attacks. One of our goal is to theoretically study and practically measure the impact of various protection schemes on the performance (speed, silicon cost and power consumption). Theoretical aspects will include an investigation of how special number representations can be used to speed-up cryptographic algorithms, and protect cryptographic devices from physical attacks. On the practical side, we will design innovative cryptographic hardware architectures of a specific processor based on the theoretical advancements described above to implement curve based protocols. We will target efficient and secure implementations for both FPGA and ASIC circuits. For more details see http://pavois.irisa.fr.

8.2.2. ANR INFRA 2011 - FAON (2012-2015)

Participants: Raphaël Bardoux, Arnaud Carer, Matthieu Gautier, Pascal Scalart.

The FAON (Frequency based Access Optical Networks) project objectives are to demonstrate the technology and feasibility of a new type of Passive Optical Network (PON) for broadband access which uses a Frequency based shared access technique known as Frequency Division Multiplexing (FDM). These goals completely fall into the line of the expected capacity increase in PON which is today forecasted to go from 100 Mbps per user to 1 Gbps. For more details, see http://www.agence-nationale-recherche.fr/en/anr-funded-project/?tx_lwmsuivibilan_pi2[CODE]=ANR-11-INFR-0005. Faon involves Orange Labs, CEA-LETI, University of South Brittany (Lab-STIC laboratory) and Univ. Rennes I (Foton laboratory and CAIRN team). CAIRN aims at developing a high-rate architecture at the receiver side. Specific receiver algorithms (synchronization and equalization) and FPGA implementation are the key issues that will be addressed.

8.2.3. Equipex FIT - Future Internet (of Things)

Participants: Olivier Sentieys, Arnaud Carer, Matthieu Gautier, Ganda-Stéphane Ouedraogo.
FIT is one of 52 winning projects from the first wave of the French Ministry of Higher Education and Research’s “Équipements d’Excellence” (Equipex) research grant programme. FIT involves UPMC, Inria, LSIIT and the Institut Mines-Télécom and runs over a nine-year period. FIT offers a federation of several independent experimental testbeds to provide a larger-scale, more diverse and higher performance platform for accomplishing advanced experiments. For more details, see http://fit-equipex.fr/. Inria (CAIRN and Socrate teams) develops the cognitive radio testbed that will provide a full experimental environment for evaluating the coexistence and the cooperation between heterogeneous multistandard nodes. To this aim, a fully open architecture based on software defined radio nodes is developed. CAIRN aims at proposing an FPGA based software defined radio with high level specifications. Cognitive radio testbed development is supported by an ADT funding of Inria.

8.2.4. ANR Ingénierie Numérique et Sécu­rité - ARDyT (2011-2015)

Participants: Arnaud Tisserand, Philippe Quémerais.

ARDyT (in French: Architecture Reconfigurable Dynamiquement Tolérante aux fautes) is a project on a Reliable and Reconfigurable Dynamic Architecture. It involves IRISA-CAIRN(Lannion), Lab-STICC (Lorient), LIEN (Nancy) and ATMEL. The purpose of the ARDyT project is to provide a complete environment for the design of a fault tolerant and self-adaptable platform. Then, a platform architecture, its programming environment and management methodologies for diagnosis, testability and reliability have to be defined and implemented. The considered techniques are exempt from the use of hardened components for terrestrial and aeronautics applications for the design of low-cost solutions. The ARDyT platform will provide a European alternative to import ITAR constraints for fault-tolerant reconfigurable architectures. For more details see http://ardyt.irisa.fr.

8.2.5. ANR Ingénierie Numérique et Sécu­rité - COMPA (2011-2015)

Participants: Emmanuel Casseau, Steven Derrien, Antoine Courtay, Mythri Alle, Yaset Oliva Venegas.

COMPA (model oriented design of embedded and adaptive multiprocessor) is a project which involves CAIRN, IETR (Rennes) and Lab-STICC (Lorient). The aim of the project is to design adaptive multiprocessor embedded systems for executing dataflow programs. The use case is the Reconfigurable Video Coding (RVC) standard. More specifically, we focus on the portable and platform-independent RVC-CAL language to describe the applications. We use transformations to refine, increase parallelism and translate the application model into software and hardware components. Specific scheduling and actor’s mapping are also investigated for runtime execution. For more details see http://www.compa-project.org.

8.2.6. ANR Ingénierie Numérique et Sécu­rité - DEFIS (2011-2015)

Participants: Olivier Sentieys, Romuald Rocher, Nicolas Simon.

DEFIS (Design of fixed-point embedded systems) is a project which involves CAIRN, LIP6 (University of Paris 6), LIRMM (University of Perpignan), CEA LIST, Thales, Inpixial. The main objectives of the project are to propose new approaches to improve the efficiency of the floating-point to fixed-point conversion process and to provide a complete design flow for fixed-point refinement of complex applications. This infrastructure will reduce the time-to-market by automating the fixed-point conversion and by mastering the trade-off between application quality and implementation cost. Moreover, this flow will guarantee and validate the numerical behavior of the resulting implementation. The proposed infrastructure will be validated on two real applications provided by the industrial partners. For more details see http://defis.lip6.fr.

8.2.7. Labex CominLabs - BoWI (2014-2018)

Participants: Olivier Sentieys, Antoine Courtay, Olivier Berder, Pascal Scalart, Arnaud Carer, Viet-Hoa Nguyen, Zhongwei Zheng.
The BoWi project (Body Wold Interactions) aims at designing an accurate gesture and body movement estimation using very-small and low-power wearable sensor nodes. It initially stems from a proposal of the CominLabs think thank focused on the society challenge called Digital Environment for the Citizen. It is also related to the social challenge ICT for Personalized Medicine and to the research track Energy Efficiency in ICT. The main objective of the project is to propose pioneer interfaces for an emerging interacting world based on smart environments (house, media, information and entertainment systems...). Basically the project relies on Wireless Body Areas Sensor Networks; the aim is the accurate Gesture and Body Movement estimation with extremely severe constraints in terms of footprint and power consumption according to on-body energy harvesting perspectives. The BoWi geolocation approach will combine radio communication distance measurement and inertial sensors and it will also strongly benefit from cooperative techniques based on multiple observations and distributed computation. Different types of applications, as health care, activity monitoring and environment control, will be considered and evaluated along with a human-machine interface expertise.

The scientific challenge is global and deals with the solution to be interactively invented by all partners: a short-range geolocation method based on distributed and cooperating devices processing multisource data issued from radio-communication distance estimation and integrated inertial sensors. It includes several specific contributions:

- Dynamic and cooperative communication coding and protocol for inter-nodes communications. This includes cooperative communications and protocols such as cooperative MIMO, relaying, error coding, network coding and MAC and wake-up radio protocols.
- Node hardware/software architecture design and self-adaptive distributed processing for geolocation with aggressive low-power run-time optimisation.
- Channel models and antennas for short-range communications. This study will be performed for various radio standards from upcoming BAN 802.15.6, 802.15.4a technologies to future UWB solutions.
- Channel models and antennas for WBASN at millimeter waves. This is a promising perspective for antenna miniaturization, however no front-ends are yet available.
- In depth and specific analysis of human-machine interactions to set system constrains and define user requirement according to various application perspectives.

In practice the BoWi partners aim to deliver the design of basic components, a prototype based on available radio front-ends and energy harvesting devices as well as a system simulator including mm-wave models. Results will also concern the specification of future radio-front ends. The BoWI involves CAIRN, IETR (Rennes), and Lab-STICC (Brest, Lorient, Vannes). For more details see http://www.bowi.cominlabs.ueb.eu/fr.


Participants: Olivier Sentieys, Daniel Chillet, Cédric Killian, Jiating Luo, Van Dung Pham.

3DCORE (3D Many-Core Architectures based on Optical Network on Chip) is a project which involves CAIRN, FOTON (Rennes, Lannion) and Institut des Nanotechnologies de Lyon. 3D integration in the ultra deep submicron domain means the implementation of billions of transistors or of hundreds of cores on a single chip with the need to ensure a large number of exchanges between cores, and the obligation to limit the power consumption. Focusing on system integration rather than transistor density, allows for both functional and technological diversification in integrated systems. The functional diversification allows for non-digital functionalities to migrate from the board level into the (on-)chip level. This allows for integration of new technologies that enable high performance, low power, high reliability, low cost, and high design productivity. Use of Optical Network-on-Chip (ONoC) promises to deliver significantly increased bandwidth, increased immunity to electromagnetic noise, decreased latency, and decreased power consumption while wavelength routing and Wavelength Division Multiplexing (WDM) contributes to the valuable properties of optical interconnect by permitting low contention or even contention free routing. WDM allows for multiple signals to be transmitted simultaneously, facilitating higher throughput. Individual realization of CMOS compatible optical components, such as, waveguides, modulators, and detectors lets the community foresee that such
integration may be possible in the next ten years. The aim of the project is therefore to investigate new optical interconnect solutions to enhance by 2 to 3 magnitude orders energy efficiency and data rate of on-chip interconnect in the context of a many-core architecture targeting both embedded and high-performance computing. Moreover, we envisage taking advantage of 3D technologies for designing a specific photonic layer suitable for a flexible and energy efficient high-speed optical network on chip (ONoC).


Participants: Emmanuel Casseau, Arnaud Tisserand, Huu Van Long Nguyen.

RELIASIC (Reliable Asic) is a project which involves CAIRN, Lab-STICC (University of Bretagne Sud) and IETR (Institut d’Electronique et de Télécommunications de Rennes). One of the most critical challenges of the next design technologies will be fault-tolerant computation. The increase in integration density and the requirement of low-energy consumption can only be sustained through low-powered components, with the drawback of a looser robustness against transient errors. In the near future, electronic gates to process information will be inherently unreliable. New techniques will be required to increase the reliability of operators and components. The aim of the project is to address this problem with a bottom-up approach, starting from an existing application as a use case (a GPS receiver) and adding some redundant mechanisms to allow the GPS receiver to be tolerant to transient errors due to low voltage supply.


Participants: Arnaud Tisserand, Nicolas Veyrat-Charvillon, Karim Bigou, Gabriel Gallin.

H-A-H for Hardware and Arithmetic for Hyperelliptic Curves Cryptography is a project on advanced arithmetic representation and algorithms for hyper-elliptic curve cryptography. It involves IRISA-CAIRN (Lannion) and IRMAR (Rennes).

Arithmetic has an important role to play in providing algorithms robust against physical attacks (e.g., analysis of the power consumption, electromagnetic radiations or computation timings). Currently, there are only a very few hardware implementations of HECC (without any open source availability). This project will provide novel implementations of HECC based cryptographic algorithms on custom hardware platforms. For more details see http://h-a-h.inria.fr/.

8.3. European Initiatives

8.3.1. FP7 FLEXTILES

Participants: Olivier Sentieys, Emmanuel Casseau, Antoine Courtay, Daniel Chillet, Philippe Quémerais, Christophe Huriaux, Quang Hoa Le.

Program: FP7-ICT-2011-7
Project acronym: Flextiles
Coordinator: Thales
Other partners: Thales (FR), UR1 (FR), KIT (GE), TU/e (NL), CSEM (SW), CEA LETI (FR), Sundance (UK)

Project title: Self Adaptive Heterogeneous Manycore Based on Flexible Tiles

A major challenge in computing is to leverage multi-core technology to develop energy-efficient high performance systems. This is critical for embedded systems with a very limited energy budget as well as for supercomputers in terms of sustainability. Moreover the efficient programming of multi-core architectures, as we move towards manycores with more than a thousand cores predicted by 2020, remains an unresolved issue. The FlexTiles project will define and develop an energy-efficient yet programmable heterogeneous manycore platform with self-adaptive capabilities. The manycore will be associated with an innovative virtualisation layer and a dedicated tool-flow to improve programming efficiency, reduce the impact on time to market and reduce the development cost by 20 to 50%. FlexTiles will raise the accessibility of the manycore technology to industry - from small SMEs to large companies - thanks to its programming efficiency and its ability to adapt to the targeted domain using embedded reconfigurable technologies.
8.3.2. **FP7 ALMA**

**Participants:** Steven Derrien, Romuald Rocher, Olivier Sentieys, Ali Hassan El-Moussawi.

- **Program:** FP7-ICT-2011-7
- **Project acronym:** Alma
- **Project title:** Architecture oriented parallelization for high performance embedded Multicore systems using scilab
- **Duration:** Sep. 2011 - Nov. 2014
- **Coordinator:** KIT
- **Other partners:** KIT (GE), UR1 (FR), Recore Systems (NL), Univ. of Peloponnese (GR), TEI-MES (GR), Intracom SA (GR), Fraunhofer (GE)

The mapping process of high performance embedded applications to today’s multiprocessor system on chip devices suffers from a complex toolchain and programming process. The problem here is the expression of parallelism with a pure imperative programming language which is commonly C. This traditional approach limits the mapping, partitioning and the generation of optimized parallel code, and consequently the achievable performance and power consumption of applications from different domains. The Architecture oriented parallelization for high performance embedded Multicore systems using scilab (ALMA) project aims to bridge these hurdles through the introduction and exploitation of a Scilab-based toolchain which enables the efficient mapping of applications on multiprocessor platforms from high-level abstraction descriptions. This holistic solution of the toolchain allows the complexity of both the application and the architecture to be hidden, which leads to a better acceptance, reduced development cost and shorter time-to-market. Driven by the technology restrictions in chip design, the end of Moore’s law and an unavoidable increasing request of computing performance, ALMA is a fundamental step forward in the necessary introduction of novel computing paradigms and methodologies. ALMA helps to strengthen the position of Europe in the world market of multiprocessor targeted software toolchains. The challenging research will be achieved by the unique ALMA consortium which brings together industry and academia. High class partners from industry such as Recore and Intracom, will contribute their expertise in reconfigurable hardware technology for multi-core systems-on-chip, software development tools and real world applications. The academic partners will contribute their outstanding expertise in reconfigurable computing and compilation tools development.

8.4. **International Initiatives**

8.4.1. **Inria Associate Teams**

8.4.1.1. **HARDIESSE**

- **Title:** Heterogeneous Accelerators for Reconfigurable DynamIc, Energy efficient, Secure SystEms
- **International Partner (Institution - Laboratory - Researcher):** University of Massachusetts at Ahmerst (USA)
- **Duration:** 2014 - 2016
- **See also:** [https://team.inria.fr/cairn/hardiesse/](https://team.inria.fr/cairn/hardiesse/)

Rapid evolutions of applications and standards require frequent in-the-field system modifications and thus strengthen the need for adaptive devices. This need for a strong flexibility, combined with technology evolution (and the so-called power wall) has motivated the surge towards the use of multiple processor cores on a single chip (MPSoC). While it is now clear that we have entered the multi-core era, it is however indisputable that, especially for energy-efficient embedded systems, these architectures will have to be heterogeneous, by combining processor cores and specialized accelerators. We foresee a need for systems able to continuously adapt themselves to changing environments where software updates alone will not be enough for tackling energy management and error tolerance challenges. We believe that a dynamic and transparent adaptation of the hardware structure is the key to success. Security will also be an important challenge for embedded devices. Protections against physical attacks will have to be integrated in all secured components. In this Associated Team, we will study new reconfigurable structures for such hardware accelerators with specific focus on: energy efficiency, runtime dynamic reconfiguration, security, and verification.
8.4.2. Inria International Partners

8.4.2.1. Declared Inria International Partners

Computer Science Department, Colorado State University in Fort-Collins (USA), Prof. Sanjay Rajopadhye, Loop parallelization, development of high-level synthesis tools, Inria Associate Team (2010-2012).

Department of Computer Science, Lund University (Sweden), Prof. Krzysztof Kuchcinski, Hardware accelerators modeling using constraint-based programming.

Tampere University of Technology (Finland), Prof. Jarmo Takala, From dataflow-based video applications to embedded multicore platforms.

University College Cork (Ireland), Prof. Liam Marnane and Prof. Emanuel Popovici, Arithmetic operators for cryptography, side channel attacks for security evaluation, energy-harvesting sensor networks, and sensor networks for health monitoring.

University of Massachusetts at Amherst (USA), Prof. Russel Tessier and Prof. Maciej Ciesielski, Methods and tools for automatic reconfigurable arithmetic circuit generation.

8.4.2.2. Informal International Partners

Imec (Belgium), Optimization of embedded systems using fixed-point arithmetic.

Electrical Engineering Department, Indian Institute of Technology Delhi (India), Cooperative and MIMO wireless communications.

Ecole Polytechnique Fédérale de Lausanne - EPFL (Switzerland), Optimization of embedded systems using fixed-point arithmetic.

Technical University of Madrid - UPM (Spain), Optimization of embedded systems using fixed-point arithmetic.

LRTS laboratory, Laval University in Québec (Canada), Architectures for MIMO systems, Wireless Sensor Networks, Inria Associate Team (2006-2008).

LSSI laboratory, Québec University in Trois-Rivières (Canada), Design of architectures for digital filters and mobile communications.

Department of Electrical and Computer Engineering, University of Patras (Greece), Wireless Sensor Networks, data merging, priority scheduling, loop transformations for memory optimizations.

Karlsruhe Institute of Technology - KIT (Germany), Loop parallelization and compilation techniques for embedded multicores.

Ruhr - University of Bochum - RUB (Germany), Reconfigurable architectures.

University of Science and Technology of Hanoi (Vietnam), Participation of several CAIRN’s members in the Master ICT / Embedded Systems.

8.4.3. Participation In other International Programs

8.4.3.1. CNRS PICS - SpiNaCH (2012 - 2014)

Title: Secure and low-Power sensor Networks Circuits for Healthcare embedded applications

Principal investigator: Arnaud Tisserand, Olivier Berder, Olivier Sentieys

International Partner (Institution - Laboratory - Researcher): Code&Crypto group in University College Cork (Ireland)

Duration: 2012 - 2014

Biomedical sensor networks may be used more and more in the future. For instance, they allow patient’s health-care parameters to be remotely monitored at home. In this project, we plan to address two important challenges in the design of biomedical sensors networks: i) design of low-power sensor devices for embedded autonomous systems (health monitoring, pace-maker...) with long battery life; ii) confidentiality and security aspects and especially with public key cryptography processor that are robust against side channel attacks (measure of the computation time, the power consumption or the electromagnetic radiations of the circuit) and with limited power-energy resources.
8.5. International Research Visitors

8.5.1. Visits of International Scientists

Prof. Liam Marnane (University College Cork, Ireland) for one week in November (funded by CNRS PICS SpiNaCH project).

Fiona Edwards-Murphy, PhD student, (University College Cork, Ireland) for two weeks in September (funded by CNRS PICS SpiNaCH project).

Prof. Sanjay Rajopadhye (Colorado State University, USA) for one week in June (visiting professor position from University Rennes 1).

8.5.1.1. Internships

Singh Rajhans, B.Eng. student, Indian Institute of Technology Roorkee (Roorkee, India), Intrinsic Fault Tolerance of Hopfield Artificial Neural Network Model for task scheduling in RSoC, from May 2014 to July 2014 [63].

Jiating Luo, Master’s student, École centrale de Pékin (Beijing, China), Design of a Wavelength Allocator for Optical Network-on-Chips, from May 2014 to Sep 2014.

8.5.2. Visits to International Teams

Viet Hoa Nguyen, PhD student, visited IIT Delhi for 3 months between October and December 2014.

Christophe Huriaux, PhD student, visited UMASS for 3 months between May and July 2014.

Steven Derrien visited UMASS for 1 week in December 2014.

9. Dissemination

9.1. Promoting Scientific Activities

9.1.1. Scientific Events Selection

9.1.1.1. Responsible of Conference Program Committee

D. Chillet was Track Chair of the 12th IEEE/IFIP International Conference on Embedded and Ubiquitous Computing (EUC).

O. Sentiéys was Track Chair at IEEE NEWCAS.

A. Tisserand is program co-chair of the 22nd IEEE Symposium on Computer Arithmetic (Lyon, 22-24 June 2015).

A. Tisserand was the program co-chair of ComPAS national conference, Conférence d’informatique en Parallélisme, Architecture et Système (Neuchâtel Suisse, 22-25 April 2014).

9.1.1.2. Member of Conference Program Committee

E. Casseau was a member of the technical program committee of SiPS.

D. Chillet was member of the technical program committee of HiPEAC RAPIDO, HiPEAC WRC, MCSoC, DCIS, ComPAS, and DASIP.

M. Gautier was a member of the technical program committee of IEEE WCNC, IEEE PIMRC, IEEE ICCVE, and IARIA COCORA.

C. Wolinski was a member of the technical program committee of IEEE ASAP and DSD.

A. Tisserand was a member of technical program committee of DASIP, IEEE NEWCAS, IEEE PATMOS, and IEEE Reconfig.
O. Sentieys was a member of technical program committee of IEEE/ACM DATE, IEEE FPL, IEEE/ACM ICCAD, ACM ENSSys, ACM SBCCI, IFIP/IEEE VLSI-SoC, FTFC, GRETSI.

9.1.2. Journal

9.1.2.1. member of the editorial board

D. Chillet is member of the Editor Board of Journal of Real-Time Image Processing (JRTIP).


A. Tisserand is Associate Editor of IEEE Transactions on Computers. He is a member of the editorial board of the International Journal of High Performance Systems Architecture.

9.1.3. Other Scientific Responsibilities

D. Chillet is member of the Board of Directors of Gretsi Association.

F. Charot, O. Sentieys and A. Tisserand are members of the steering committee of a CNRS spring school for graduate students on embedded systems architectures and associated design tools (ARCHI).

O. Sentieys and A. Tisserand are members of the steering committee of a CNRS spring school for graduate students on low-power design (ECOFAC).

O. Sentieys is a member of the steering committee of the GDR SOC-SIP. He is the chair of the IEEE Circuits and Systems (CAS) French Chapter. In 2013, he was an expert for some scientific organizations (ANR, AERES).

A. Tisserand is co-organizer and president of scientific council of Seminar on Security of Embedded Electronic Systems (IRISA-DGA).

9.2. Seminars and Invitations

O. Berder gave an invited lecture at SENSO 2014 conference in Ecole des Mines, Gardanne, France, in October 2014.


A. Courtay gave an invited talk at the NASA/ESA Conference on Adaptive Hardware and Systems (AHS’14), Leicester, UK, on Dynamically Reconfigurable Embedded FPGA Systems in July 2014.

O. Sentieys gave an invited talk at the 24th International Conference on Field Programmable Logic and Applications (workshop on Self-adaptive heterogeneous many-core based on Flexible Tiles), Munich, Germany, in September 2014 on Runtime Mapping of Hardware Accelerators on the Embedded FPGA Layer.

O. Sentieys gave an invited talk at GreenDays’14 on Domain-Specific Computing Platforms: the Ultimate Energy-Efficiency of Hardware Accelerators.


A. Tisserand gave an invited lecture at the CNRS ECOFAC 2014 spring school on low power arithmetic circuits.

9.3. Teaching - Supervision - Juries

9.3.1. Teaching

O. Berder: introduction to signal processing, 38h, ENSSAT (L3)

O. Berder: microprocessors and digital systems, 30h, ENSSAT (L3)
O. Berder: wireless communications, 23h, ENSSAT (M2)
O. Berder: ad hoc networks, 58h, ENSSAT (M1-M2)
O. Berder: signal processing, 12h, IUT Lannion (L2)
E. Casseau: signal processing, 16h, ENSSAT (L3)
E. Casseau: low power design, 6h, ENSSAT (M1)
E. Casseau: real time design methodology, 24h, ENSSAT (M1)
E. Casseau: computer architecture, 36h, ENSSAT (M1)
E. Casseau: system on chip and verification, 10h, Master by Research (SISEA) and ENSSAT (M2)
E. Casseau: high level synthesis, 12h, Master by Research (SISEA) and ENSSAT (M2)
S. Derrien: component and system synthesis, 20h, Master by Research (MRI ISTIC) (M2)
S. Derrien: computer architecture, 12h, ENS Cachan (L3)
S. Derrien: computer architecture, 24h, ISTIC (L3)
S. Derrien: introduction to operating systems, 8h, ISTIC (M1)
S. Derrien: embedded architectures, 48h, ISTIC (M1)
S. Derrien: high-level synthesis, 6h, ISTIC (M1)
S. Derrien: software engineering project, 40h, ISTIC (M1)
F. Charot: processor architecture, 25h University of Science and Technology of Hanoi (M1)
A. Courtay: processor architecture, 24h, ENSSAT (L3)
A. Courtay: digital electronics, 32h, ENSSAT (L3)
A. Courtay: digital system design, 12h, ENSSAT (L3)
A. Courtay: digital electronics communication interfaces, 68h, ENSSAT (M1)
A. Courtay: processor architecture, 25h, USTH (M1)
D. Chillet: embedded processor architecture, 20h, ENSSAT (M1)
D. Chillet: VHDL, 10h, ENSSAT (M1)
D. Chillet: multimedia processor architectures, 24h, ENSSAT (M2)
D. Chillet: advanced processor architectures, 24h, Master by Research (SISEA) and ENSSAT (M2)
D. Chillet: low-power digital CMOS circuits, 6h, Telecom Bretagne and University of Occidental Brittany (UBO) (M2)
M. Gautier: electronics, 42h, IUT Lannion (L1)
M. Gautier: telecommunications, 114h, IUT Lannion (L1)
M. Gautier: digital communications, 28h, IUT Lannion (L2)
C. Killian: digital electronics, 37h, IUT Lannion (L1)
C. Killian: signal processing, 36h, IUT Lannion (L2)
C. Killian: automated measurements, 40h, IUT Lannion (L2)
C. Killian: measurement chain, 20h, IUT Lannion (L2)
C. Killian: embedded systems programming, 12h, IUT Lannion (L2)
A. Kritikakou: computer architecture, 50h, ISTIC, Univ. Rennes 1 (L3)
A. Kritikakou: operating systems, 24h, ISTIC, Univ. Rennes 1 (L3)
R. Rocher: electricity, 16h, IUT Lannion (L1)
R. Rocher: electronics, 44h, IUT Lannion (L1)
R. Rocher: telecommunications, 82h, IUT Lannion (L1)
R. Rocher: signal processing, 12h, IUT Lannion (L2)
R. Rocher: digital communications, 48h, IUT Lannion (L2)
P. Scalart: non-linear optimisation, 18h, Master by Research (SISEA) and ENSSAT (M2)
P. Scalart: parametric modelisation, optimal and adaptive filters, 24h, Master by Research (SISEA) and ENSSAT (M2)
P. Scalart: source coding, 14h, Master by Research (SISEA) and ENSSAT (M2)
P. Scalart: cellular networks, 24h, ENSSAT (M2)
P. Scalart: digital communication systems, 32h, ENSSAT (M1)
P. Scalart: random signals and systems, 12h, ENSSAT (M1)
O. Sentieys: digital signal processing, 40h, ENSSAT (M1)
O. Sentieys: VLSI integrated circuit design, 40h, ENSSAT(M1)
A. Tisserand: multiprocessor architectures and programming, 20h, ENSSAT and Master by Research (SISEA) (M2)
A. Tisserand: hardware computer arithmetic operators, 6h, Master by Research (SISEA) (M2)
B. Vrigneau: electronics, 36h, IUT Lannion (L1)
B. Vrigneau: telecommunications, 128h, IUT Lannion (L1)
B. Vrigneau: digital communications, 28h, IUT Lannion (L1)
C. Wolinski: computer architectures, 92h, ESIR (L3)
C. Wolinski: design of embedded systems, 48h, ESIR (M1)
C. Wolinski: signal, image, architecture, 26h, ESIR (M1)
C. Wolinski: programmable architectures, 10h, ESIR (M1)
C. Wolinski: component and system synthesis, 10h, Master by Research (MRI ISTIC) (M2)

9.3.2. Teaching Responsibilities

C. Wolinski is the Director of ESIR.
P. Quinton is the director of Ecole Normale Supérieure de Rennes.
D. Chillet is the Director of Academic Studies of ENSSAT.
P. Scalart is the Head of the Electronics Engineering department of ENSSAT.
S. Derrien is the responsible of the first year of the Master of Computer Science at ISTIC since Sep. 2012.
O. Sentieys is responsible of the "Embedded Systems" branch of the SISEA Master by Research.
D. Chillet is the responsible of the ICT Master of University of Science and Technology of Hanoi and also co-responsible of the "Embedded Systems" speciality of this master.

ENSSAT stands for "École Nationale Supérieure des Sciences Appliquées et de Technologie" and is an "École d’Ingénieurs" of the University of Rennes 1, located in Lannion.
ISTIC is the Electrical Engineering and Computer Science Department of the University of Rennes 1.
ESIR stands for "École supérieure d'ingénieur de Rennes" and is an "École d’Ingénieurs" of the University of Rennes 1, located in Rennes.

D. Chillet is member of the French National University Council since 2009 in signal processing and electronics (Conseil National des Universités en 61e section).
D. Chillet is member of the Permanent Committee of the French National University Council since November 2011 in signal processing and electronics (Commission Permanente du Conseil National des Universités en 61e section).
A. Tisserand is member of the French National University Council since 2011 in computer science (Conseil National des Universités en 27e section).
9.3.3. Supervision


PhD: Karim Bigou, RNS Hardware Units for ECC, Nov. 2014, A. Tisserand.


PhD in progress: Christophe Huriaux, Embedded reconfigurable hardware accelerators with efficient dynamic reconfiguration management, Oct. 2012, O. Sentieys, A. Courtay.


PhD in progress: Van Dung Pham, Design space exploration in the context of 3D integration of multiprocessors interconnected by Optical Network-on-Chip, Dec 2014, O. Sentieys, D. Chillet, C. Killian, S. Le-Beux.


PhD in progress: Zhongwei Zheng, Short-range geolocation algorithms based on distributed multi-sensor processing, Nov. 2012, P. Scalart, jointly with C. Roland from Lab-STICC.

10. Bibliography

Major publications by the team in recent years


Publications of the year

Doctoral Dissertations and Habilitation Theses


**Articles in International Peer-Reviewed Journals**


Systems”, April 2014, vol. 33, n° 4, pp. 599-612 [DOI : 10.1109/TCAD.2013.2292510], https://hal.inria.fr/hal-01097606

[34] S. PIESTRACK. A note on RNS architectures for the implementation of the diagonal function, in "Information Processing Letters", 2015, pp. 1-9, https://hal.inria.fr/hal-01088395


Articles in National Peer-Reviewed Journals

[37] P. COTRET, G. GOGNIAT. Protection des architectures hétérogènes sur FPGA : une approche par pare-feux matériels, in "Techniques de l’Ingenieur", February 2014, 10 p. , Référence IN175, https://hal.inria.fr/hal-00866646

[38] A. TISSEMERAND. Circuits électroniques pour la génération de nombres aléatoires, in "Techniques de l’ingénieur Technologies des composants", August 2014, n° H5215, https://hal.inria.fr/hal-01061471

International Conferences with Proceedings

[39] K. BIGOU, A. TISSEMERAND. RNS Modular Multiplication through Reduced Base Extensions, in "ASAP - 25th IEEE International Conference on Application-specific Systems, Architectures and Processors", Zurich, Switzerland, IEEE, June 2014, pp. 57-62 [DOI : 10.1109/ASAP.2014.6868631], https://hal.inria.fr/hal-01010961

[40] J. CHEN, A. TISSEMERAND, E. POPOVICI, S. D. COTOFANA. Robust Sub-Powered Asynchronous Logic, in "PATMOS - International Workshop on Power And Timing Modeling, Optimization and Simulation", Palma de Mallorca, Spain, IEEE, September 2014 [DOI : 10.1109/PATMOS.2014.6951863], https://hal.inria.fr/hal-01063821


[45] M. GAUTIER, G. S. OUEDRAOGO, O. SENTEIEYS. Design Space Exploration in an FPGA-Based Software Defined Radio, in "Euromicro Conference on Digital System Design", Verona, Italy, August 2014 [DOI : 10.1109/DSD.2014.44], https://hal.inria.fr/hal-01084781


[48] C. HURIAUX, O. SENTEIEYS, R. TESSIER. FPGA Architecture Support for Heterogeneous, Relocatable Partial Bitstreams, in "FPL - 24th International Conference on Field Programmable Logic and Applications", Munich, Germany, IEEE, September 2014 [DOI : 10.1109/FPL.2014.6927494], https://hal.inria.fr/hal-01017184


[50] Q. H. KHUAT, D. CHILLET, M. HUBNER. Considering reconfiguration overhead in scheduling of dependent tasks on 2D Reconfigurable FPGA, in "International Conference on Adaptive Hardware Systems (AHS 2014)", Leicester, United Kingdom, July 2014, 8 p., https://hal.inria.fr/hal-01097496

[51] Q. H. KHUAT, D. CHILLET, M. HUBNER. Dynamic Run-time Hardware/Software Scheduling For 3D Reconfigurable SoC, in "International Conference on Reconfigurable Computing and FPGAs (ReConFig 2014)", Cancun, Mexico, December 2014, https://hal.inria.fr/hal-01097509


[53] Q. H. LE, E. CASSEAU, A. COURTAY. Place Reservation Technique for Online Task Placement on a Multi-context Heterogeneous Reconfigurable Architecture, in "International Conference on Reconfigurable Computing and FPGAs (ReConFig 2014)", Cancun, Mexico, December 2014, 6 p., https://hal.inria.fr/hal-01076691


Conferences without Proceedings


Scientific Books (or Scientific Book chapters)


Patents and standards

[73] O. SENTIEYS, A. COURTAY, C. HURIAUX, S. PILLEMENT. **Method and device for programming a FPGA**, January 2014, n° 14305143.1, https://hal.inria.fr/hal-01099866

Other Publications


[75] Q. H. LE, E. CASSEAU, A. COURTAY. **Placement en Ligne de Tâches sur Architecture Dynamiquement Reconfigurable Hétérogène**, June 2014, Colloque GDR SOC-SIP, https://hal.inria.fr/hal-01061009


**References in notes**


[79] Z. ALLIANCE. *Zigbee specification*, ZigBee Alliance, 2005, n° ZigBee Document 053474r06, Version


