## Section: New Results

### Algorithm Architecture Interaction

Participants : Steven Derrien, Romuald Rocher, Daniel Ménard, François Charot, Christophe Wolinski, Olivier Sentieys, Patrice Quinton.

#### Computation Accuracy Optimization

Participants : Daniel Ménard, Pascal Scalart, Olivier Sentieys, Romuald Rocher, Thibault Hilaire, Hai-Nam Nguyen, Mohamed Diab.

##### Dynamic Precision Scaling

The traditional approach to design a fixed-point system is based on the worst-case principle. For example, for a digital communication receiver, the maximal performances and the maximal input dynamic are retained and the more constraint transmission channel is considered. Nevertheless, the noise and the signal levels evolve during time. Moreover, the data rate depends on the service (video, image, speech) used by the terminal and the required performances (bit error rate) are linked to the service. These various elements show that the fixed-point specification depends on external elements (noise level, input signal dynamic range, quality of service) and can be adapted during time to reduce the average power consumption.

An approach in which the fixed-point specification is adapted dynamically according to the input receiver SNR (Signal-to-Noise Ratio) has been proposed. This concept is called
*Dynamic Precision Scaling (DPS)* . To adapt the fixed-point specification during time, the architecture integrates flexible operators as presented in Section
6.1.1 . Our approach interest has been shown on a WCDMA (Wide-band Code Division Multiple Access) receiver example. The WCDMA receiver is made-up of two main parts corresponding to
rake-receiver and the searcher. For the rake receiver which decodes the transmitted symbols, the performances are evaluated through the bit error rate (BER). By applying the dynamic precision scaling, up to 40% of consumed energy can be saved compared to an implementation based to the
worst case analysis. For the searcher, the performances are evaluated through the mis-detection and false-alarm probabilities. The results show that the DPS approach allows reducing up to 25 % the energy consumption
[39] .

##### Fixed-Point Accuracy Evaluation

A collaboration with Imec (Interuniversitair Micro-Electronika Centrum), Belgium, has started in 2008 for scenario-based fixed-point data format refinement to enable energy-scalable of Software Defined Radios (SDR). The aim is to apply our analytical approach to evaluate the quantization noise power on the SSFE (Selective Spanning for Fast Enumeration) algorithm. This algorithm is a near-Maximum Likelihood MIMO detector. Moreover, this algorithm includes decisions operators. So, another aim is to extend our analytical model to this type of operator to treat a complete signal processing algorithm.

To obtain this application analytical expression, the back-end of the accuracy evaluation module of the FloatToFix tool is used. The user gives the transfer function between all noise sources and the application output, and the output noise power expression is automatically computed. Thus, the noise power value is obtained applying this expression to the quantization noise statistics, given by the different fixed-point data formats.

##### Optimal Fixed-Point Implementation of Filter/Controller

A framework to optimize the implementation of linear time invariant filters or controllers in fixed-point architectures has been defined
[34] . The digital implementation leads to a numerical degradation of the controller performances due to the quantization of the involved coefficients (parametric errors)
and the roundoff noises (numerical noises) in the numerical computations. The application is described with an algebraic form. Previous works have been extended to carry-out the operator finite word-length (FWL) optimization process. The cost function corresponding to surface or power
consumption has been developed. Two implementation schemes corresponding to
*Roundoff After Multiplication* and
*Roundoff before Multiplication* have been proposed. From the definition of the filter or controller (i.e. the transfer function), it is possible to choose multiple possible structure of realization (state-space, delta-operator, rho-operator, etc.), find the optimal one (according to
one or some FWL measures) and generate the equivalent C, MATLAB or VHDL fixed-point code. The FWR Toolbox (for Matlab) was built to achieve this 'optimal' fixed-point implementation

#### Multi-Antenna Systems

Participants : Olivier Berder, Pascal Scalart, Olivier Sentieys, Quoc-Tuong Ngo, Patrice Quinton.

Considering the possibility for the transmitter to get some Channel State Information (CSI) from the receiver, antenna power allocation strategies can be performed thanks to the joined optimization of linear precoder (at the transmitter) and decoder (at the receiver). A new exact solution of the maximization of the minimum Euclidean distance between received symbols has been proposed for two 16-QAM modulated symbols. This precoder shows an important enhancement of this minimum distance compared to diagonal precoders which leads to a significant BER improvement. This new strategy selects the best precoding matrix among eight different expressions, depending on the value of the channel angle. In order to decrease the complexity, other sets of precoders have been proposed and the performances of the simplest one, composed of only two different precoders, remain very close to the optimal in terms of BER.

#### Parallel reconfigurable architectures for LDPC decoding

Participants : François Charot, Christophe Wolinski.

LDPC codes are a class of error-correcting code introduced by Gallager with an iterative probability-based decoding algorithm. Their performances combined with their relatively simple decoding algorithm make these codes very attractive for the next satellite and radio digital transmission system generations. LDPC codes were chosen in DVB-S2, 802.11n, 802.16e and 802.3an standards. The major problem is the huge design space composed of many interrelated parameters which enforces drastic design trade-offs. Another important issue is the need for flexibility of the hardware solutions which have to be able to support all the declinations of a given standard.

Previously we have defined a generic architecture template that is composed of several processing modules and a set of interconnection buses for inter-module communications. Each module includes two processing units (called
*bitnode* and
*checknode* processing units), and a set of memory banks. The number of modules, the number of interconnection buses, the size and the number of memory banks is standard dependent. LDPC decoding algorithm rests on an appropriate distribution of the block of input data in the different
memory banks and on a scheduling of the computation obtained using constraints programming-based optimization tools. This year we have concentrated on the modeling of our parametric architecture at CABA level using the SoCLib platform and on its implementation using the FPGA platform.
Different versions of the proposed LDPC decoder were realized
[27] ,
[28] .

#### Algorithm Optimization for Low Energy in Wireless Applications

Participants : Olivier Berder, Tuan-Duc Nguyen, Olivier Sentieys.

Since the wireless nodes are physically separated in cooperative MIMO systems, the imperfect time synchronization between cooperative nodes clocks leads to an unsynchronized MIMO transmission. The effect of this unsynchronization is that inter-symbol interference (ISI) appears and the space-time sequences from different nodes are no longer orthogonal. At the reception side, each cooperative node has to forward its received signal through a wireless channel to the destination node for space-time signal combination which leads to additional noise in the final received signal. Consequently, the cooperative transmission synchronization error and the cooperative reception additional noise lead to a performance degradation and affect the energy efficiency advantage of cooperative MIMO system over SISO system [37] . For small range of transmission synchronization error, the performance degradation is negligible and the cooperative MIMO system performance is rather tolerant. However, for large range of error, the performance decreases quickly and the degradation is significant. A new efficient space time combination technique based on a low complexity algorithm has been proposed for cooperative MIMO system in the presence of transmission synchronization error. The new technique principle performs a multiple sampling process and a signal combination from different sampled sequences to reconstruct the orthogonality of the transmission space-time sequences [38] .

#### Wireless Communications for Automotive Systems

Participants : Olivier Berder, Tuan-Duc Nguyen, Olivier Sentieys, Jérome Astier.

The CAPTIV (Cooperative strAtegies for low Power wireless Transmissions between Infrastuctures and Vehicles) project aims at using new radio communications technologies in order to enhance drivers security. In a cooperative network composed of vehicles and road signs equipped with autonomous radio transmitters, the communications can be optimized at different levels. It was shown that spacetime codes allow to dramatically decrease the energy consumption of communications between crossroads. In order to both elaborate CAPTIV application program and evaluate the driver behaviour in front of this new kind of information, a specific driving simulator was designed, based on the ECA-FAROS platform. A real prototype has already been evaluated and proves the feasibility of CAPTIV application, and it will be soon optimized thanks to signal processing techniques. If the main goal remains driving assistance, many applications could be implemented on this platform and it will be able to deliver any kind of information (meteo, parking, tourist information, advertisement etc.) [26] .

#### Intrusion detection system in hardware

Participants : Georges Adouko, François Charot, Christophe Wolinski.

The dynamic feature of security systems is – through anti-intrusion mechanisms (filtering at different levels: packet, connection, and application levels) evolving according to modes and levels of protection–, to our knowledge, a challenge out of reach of classical technologies based on general purpose or network processors. The requirements of security in high-speed networks (from 10 to 40 Gigabit/s) impose the implementation of the filtering rules in appropriate hardware structures. It is a matter of being able to manage a large variety of complex treatments, and also to guarantee the quality of service. Only dedicated solutions could solve the bottleneck related to the implementation complexity today, at the price of an obvious lack of flexibility and a total absence of evolution.

The aim of our research is the design of specialized hardware systems for filtering of the network traffic at high-speed. We have proposed a new high performance hardware implementation of a string matching engine based on a multi-character variant of the well-known Aho-Corasick algorithm. The proposed architecture is well suited to modern FPGAs. It allows the efficient usage of FPGA's logic and memory resources [46] . Our architecture is optimized to execute string matching in the case of tens of thousands of strings like the ones in intrusion prevention or intrusion detection systems. The proposed design has been validated through the implementation of a search engine on Altera Stratix II FPGA component in the case of a subset of rules in the Snort intrusion detection system. By applying the traffic parallelization and retiming techniques, it was shown that 40 Gbit/s traffic content scanning can be sustained [25] . In comparison with other existing architectures a significant increase in performances has been obtained.

#### Accelerating Statistical Test for Real-Time Estimation of Randomness

Participants : Renaud Santoro, Olivier Sentieys, Sébastien Roy.

The objective of a random number generator (RNG) is to produce random binary numbers which are statistically independent, uniformly distributed and unpredictable. RNGs are necessary in many applications and the number of embedded hardware architectures requiring RNGs is continuously increasing. Generally, a hybrid RNG comprising a True Random Number Generator (TRNG) and a Pseudo Random Number Generator (PRNG) is used. PRNGs are based on deterministic algorithms. They are periodic, and must be initialized by a TRNG. TRNGs are based on a physical noise source (e.g. thermal noise or free running jitter oscillators) and depend strongly on their implementation quality. Most of the TRNGs implemented in FPGA or ASIC use phase jitter produced by a free running oscillator or a Phase-Locked Loop (PLL) [69] . In practice, jitter can be influenced by noise external to the FPGA (power supply noise, temperature) and by chip activity. This dependence is a weakness exploitable by exposing the TRNG in hostile environment conditions [89] .

In cryptography, security is usually based on the randomness quality of a key generated by an RNG. Some PRNGs are recognized to produce high quality random numbers [76] . However, their quality depend on TRNG seed randomness. PRNG randomness evaluation is usually performed by using a battery of statistical tests. Several such batteries are reported in the literature including Diehard [79] and NIST [87] batteries. They are all implemented using high-level software programming. When an PRNG is evaluated, designers put a huge bit stream into memory and then submit it to software tests. If the bit stream successfully passes a certain number of statistical tests, the PRNG is said to be sufficiently random. TRNG validation is more complicated as their behavior depends on their construction, on external environments and essentially on a physical noise source which can differ in practice from an ideal noise. However, [73] has described a methodology to evaluate physical generators. The procedure is based on TRNG construction and is the technical reference of the AIS 31 [57] . TRNG weaknesses and external attacks must be prevented on real-time to inhibit TRNG output [89] , and a solution is to monitor the TRNG at switch on and during operations by using statistical tests [73] , [89] .

During this year, the possibility to implement the AIS 31 statistical tests in hardware has been studied. Then, the tests have been implemented into ASIC and FPGA targets. Hardware cost shows that the design can be used into low-cost embedded cryptography circuits. Moreover, the test data-rate allows to monitor TRNG in real-time. Finally, the TRNG monitoring interest has been demonstrated on currently TRNGs. However, using only the AIS 31 statistical tests to control the TRNG quality is not only sufficiently. Consequently, we also worked on a methodology to evaluate the randomness of TRNG based on free running oscillators. In these TRNGs, oscillator jitter is the physical noise source. We have studied the possibility of making an on-chip jitter measurement. As a result, jitter is evaluated in real time in order to test if its quantity and quality is in adequation with the TRNG design hypothesis. The on-chip measurement circuit has been implemented in ASIC and FPGA circuits. Finally, this year was concluded by the realization of an integrated circuit prototype (OCHRE) including our architecture proposal for RNG. The chip is in 130 nm CMOS technology and is composed of a TRNG, a PRNG and some hardware statistical tests. The tests monitors the TRNG quality in real time to validate the PRNG seed randomness.