## Section: New Results

### Algorithm Architecture Interaction

Participants : Steven Derrien, Romuald Rocher, Daniel Ménard, François Charot, Christophe Wolinski, Olivier Sentieys, Patrice Quinton.

#### Reconfigurable Video Coding

Participants : Emmanuel Casseau, Olivier Sentieys, Arnaud Carer, Cecile Beaumin-Palud, Herve Yviquel.

In the field of multimedia coding, standardization recommendations are always evolving. To reduce design time, Reconfigurable Video Coding (RVC) standard allows defining new codec algorithms based on a modular library of components. RVC dataflow-based specification formalism expressly targets multiprocessors platforms. However software processor cannot cope with high performance and low power requirements. Hence the mapping of RVC specifications on hardware accelerators is investigated in this work, as well as the scheduling of the functional units (FU) of the specification. Aim is to make use as much as possible of the parallelism the specification exhibits for the scheduling of the tasks based on the available resources. Reconfigurability will be used and the design of an RVC-dedicated reconfigurable architecture will be studied. First results [39] lead to the definition of a reconfigurable FIFO for optimizing cost and performance of RVC dataflow specifications by taking advantage of their dynamic behavior. This work is done within a close collaboration with IETR Rennes.

#### Range Estimation and Computation Accuracy Optimization

Participants : Daniel Ménard, Karthick Parashar, Olivier Sentieys, Romuald Rocher, Hai-Nam Nguyen, Emmanuel Casseau, Andrei Banciu.

##### Range Estimation

The floating-point to fixed-point conversion is an important part of the hardware design in order to obtain efficient implementations. In order to optimize the integer word-length under performance constraints, the dynamic variations of the variables during execution must be determined. Traditional range estimation methods based on simulations are data dependent and time consuming whereas analytical methods like interval and affine arithmetic give pessimistic results as they lack of a statistical background. Recently, a novel approach, based on the Karhunen-Loève Expansion (KLE) was presented for linear time-invariant (LTI) systems offering a solid stochastic foundation. We have investigated this theory. The KLE approach is able to optimize the integer word-length so that the distortions introduced would still satisfy the application performances. However, the accuracy of the estimation is limited by the expansion order and by the complexity of the computations. We checked the relevance of the theory for practical implementations with an OFDM modulator as a test case [38] .

##### Performance Evaluation of Fixed-Point Systems

Existing analytical techniques to evaluate performance of fixed-point systems are not applicable to the errors due to quantization in the presence of un-smooth operators like decision operators. In [58] , a generalized decision operator has been defined and an analytical model for determining the probability of decision error due to quantization noise has been proposed. Nevertheless, the perturbation theory cannot be used to propagate the decision error inside the system. Thus, it is inevitable to use simulation to evaluate performance of fixed-point systems in the presence un-smooth operators. In [56] , a hybrid technique which can be used in place of pure simulation to accelerate the performance evaluation has been proposed. The principle idea is to selectively simulate parts of the system only when un-smooth errors occur but use analytical results otherwise. We applied this approach to a complex MIMO sphere decoding algorithm in collaboration with Imec (Interuniversitair Micro-Electronika Centrum), Belgium. The performance evaluation time has been reduced of several orders of magnitude compared to existing approaches based on pure fixed-point simulations.

This technique uses the single noise source model. This model attempts to capture the fixed-point behavior of any sub-system integrating smooth operators, with a single noise source located at the system output. In [59] , an estimation of the noise frequency response has been proposed and in [60] an estimation of the noise probability density function has been defined.

#### Multi-Antenna Systems

Participants : Olivier Berder, Pascal Scalart, Quoc-Tuong Ngo.

Considering the possibility for the transmitter to get some Channel State Information (CSI) from the receiver, antenna power allocation strategies can be performed thanks to the joined optimization of linear precoder (at the transmitter) and decoder (at the receiver) according to various criteria.

An efficient linear precoder based on the maximization of the minimum Euclidean distance between two received data vectors for three data-streams MIMO spatial multiplexing systems is proposed. In the literature, these dmin-based precoders were only derived for two data streams. By using trigonometric functions, a new virtual MIMO channel representation, which is performed by two channel angles, allows the parameterization of the max-dmin precoder and the optimization of the distance between signal points at the received constellation. To illustrate the optimization process, a sub-optimal precoder is firstly derived for BPSK and QPSK modulation following the max-SNR approach, which consists in pouring power only on the most favored virtual sub-channel [52] .

According to this representation, the optimal dmin precoders are then proposed for BPSK and QPSK modulation. Simulation results over Rayleigh fading channel demonstrate a large bit-error-rate improvement of the proposed solution in comparison with beamforming and other traditional precoding strategies. It is shown that the performance improvement depends on the channel characteristics and the more dispersive the channel is, the more significant the performance improvements are.

#### Cooperative Strategies for Low-Energy Wireless Networks

Participants : Olivier Berder, Le Quang Vinh Tran, Olivier Sentieys, Tuan-Duc Nguyen [International University - VNU. - Hochiminh City, Vietnam] .

During the last decade, many works were devoted to improving the performance of relaying techniques in ad hoc networks. One promising approach consists in allowing the relay nodes to cooperate, thus using spatial diversity to increase the capacity of the system. In wireless distributed networks where multiple antennas can not be installed in one wireless node, cooperative relay and cooperative Multi-Input Multi-Output (MIMO) techniques can indeed be used to exploit spatial and temporal diversity gain in order to reduce energy consumption.

Performance and energy consumption of the cooperative MIMO and relay techniques are investigated over a Rayleigh fading channel. If under ideal conditions cooperative MIMO has been proved to be better than relay, the latter is a better solution when transmission synchronization errors occur. The comparison between these two cooperative techniques helps us to choose the optimal cooperative strategy for energy constrained WSN applications [53] . An association strategy of these two techniques is then proposed in order to exploit simultaneously the advantages of these two techniques [54] . The principle of this association strategy is that a cooperative MIMO technique is employed at multiple relay nodes to retransmit the signal by using a MISO transmission in one transmission phase instead of multiple transmission phases of the traditional parallel relay technique.

The energy efficiency of cooperative MIMO and relay techniques is very useful for the Infrastructure to Vehicle (I2V) and Infrastructure to Infrastructure (I2I) communications in Intelligent Transport Systems (ITS) networks where the energy consumption of wireless nodes embedded on road infrastructure is constrained. Applications of cooperation between nodes to ITS networks are proposed and the performance and the energy consumption of cooperative relay and cooperative MIMO are investigated in comparison with the traditional multi-hop technique. The comparison between these cooperative techniques helps us to choose the optimal cooperative strategy in terms of energy consumption for energy constrained road infrastructure networks in ITS applications.

In this context, the impact of cooperative strategies is analyzed thanks to the realistic power model of a real radio transceiver [73] . A system using a two-antenna source, two one-antenna relays and one antenna destination is considered. Three types of association strategies of Space time coding and relaying technique, MIMO full cooperative relay (MFCR), MIMO simple cooperative relay (MSCR) and MIMO normal cooperative relay (MNCR) are presented. The power consumption model parameters are extracted from characteristics of CC2420, a wireless sensor transceiver widely used and commercially available. The energy analysis for different transmit protocols are analyzed and compared to show the optimal scheme for different ranges of transmission distance. The threshold of transmission distance to choose the optimal energy consumption model is derived. The maximum transmission distance of three models is shown, i.e. 122m for Alamouti scheme and 280m for MFCR. Depending on the relative distance of relay and the transmission distance, the proposed optimal energy efficient scheme selection defines which model should be used to minimize the total energy consumption.

#### Opportunistic Routing

Participants : Olivier Berder, Olivier Sentieys, Ruifeng Zhang, Jean-Marie Gorce [Insa Lyon, INRIA Swing] .

However, the aforementioned approaches introduce an overhead in terms of information exchange, increasing the complexity of the receivers. A simpler way of exploiting spatial diversity is referred to as opportunistic routing. In this scheme, a cluster of nodes still serves as relay candidates but only a single node in the cluster forwards the packet. This paper proposes a thorough analysis of opportunistic routing efficiency under different realistic radio channel conditions. The study aims at finding the best trade-off between two objectives: energy and latency minimizations, under a hard reliability constraint. We derive an optimal bound, namely, the Pareto front of the related optimization problem, which offers a good insight into the benefits of opportunistic routings compared with classical multi-hop routing schemes. Meanwhile, the lower bound provides a framework to optimize the parameters in physical layer, MAC layer and routing layer from the viewpoint of cross layer during the design or planning phase of a network [34] .

#### Speech enhancement and coding issues

Participant : Pascal Scalart.

Microphone arrays and more specifically beamforming methods are enabling technology for hands-free communication that is now viable and cost effective. By offering directional gain to improve the signal-to-noise ratio and taking the spatial correlation of sound ?eld into account to de-reverberate the desired speech signal and to reduce noise and acoustic echoes, microphone arrays techniques play an essential role in hands-free mobile telephony, distant-talker speech recognition, voice-controlled systems, hearing aids, or audio monitoring. To tackle time-varying environments with both non-stationary signal characteristics and potentially moving sources, we worked [50] on the Generalized Sidelobe Canceller (GSC) which is an efficient implementation of adaptive beamformers. One of its main drawbacks lies in the self-cancellation phenomena of the derided signal caused by the signal leakage into the noise reference. To cope with this problem, we proposed to take benefit of the ability of the crosstalk-resistant adaptive noise canceller (CTRANC) to deal with crosstalk problem that, in fact, is the same as the signal leakage problem in the GSC. Describing the new adaptive recursive structure for the GSC, we derived a complete analysis of the CTRANC and proposed new adaptive algorithms in the frequency-domain [70] . We established new results about the convergence properties and the existence of an equilibrium point for this recursive structure and we showed that the recursive GSC is an effective solution to solve the leakage problem and to improve its performance.

In the audio coding domain, we focused our research activity on stereo coding which is widely used in audio applications such as streaming, broadcasting or storage, and significant progress was made in reducing the bit rate for (joint) stereo coding, as shown by the evolution of MPEG audio standards (MP3, AAC, HE-AAC, USAC). On the other hand, in conversational applications speech coders are designed to handle mostly mono signals; stereo, when supported by the service (e.g conferencing), is usually coded using dual mono, that is by coding separately each channel. Recently, ITU-T has launched several standardization activities aiming at extending existing wideband (50-7000 Hz) mono coding standards to superwideband (50-14000 Hz) and stereo. Examples are given by G.729.1-SWB, G.718-SWB, and G.722/G.711.1-SWB. In these examples, the bitrate set for stereo does not allow dual mono coding and therefore joint stereo coding operating at lower bit rate than dual mono is needed. In the same spirit as the G.722/G.711.1-SWB activity, we proposed an experimental stereo extension of G.722 that follows the constraints of the stereo extension, e.g. frame length of 5 ms and additional bit rate of 8 or 16 kbit/s. Using a frequency domain stereo to mono downmixing technique, the proposed coder [48] preserves the energy of mono signal and avoids issues due to the complete dependency on one channel (L or R) for the phase computation. A parametric stereo extension of G.722 at 56+8 and 64+16 kbit/s has been studied and the quality of the proposed coder was evaluated in MUSHRA tests. The proposed stereo coder operates at the lower bitrate than G.722 dual mono, with a speech and music quality at 64+16 kbit/s that is equivalent to G.722 dual mono.

#### True Random Number Generators

Participants : Renaud Santoro, Olivier Sentieys, Arnaud Tisserand, Philippe Quémerais, Arnaud Carer, Thomas Anger.

##### Ochre V2: TRNG chip with on-line randomness quality monitoring

A new chip has been designed and sent to fabrication: 4mm ^{2} in
CMOS 130nm STMicroelectronics HCMOS9GP. This circuit is a true
random number generator based on several architectures of oscillator
sampling (the physical noise source is the jitter produced by one or
several free running oscillators). The quality of the random sequence
generated by a TRNG depends on many parameters such as noise source
characteristics, implementation details and environment parameters. A
hardware unit for on-line and real-time evaluation of the quality of
TRNG output has been design and implemented in the Ochre V2
circuit. This is useful in critical applications such as cryptographic
embedded systems. The on-line and real-time monitoring of the
generated random sequence is useful to prevent randomness quality
reduction due to environment variations or physical attacks against
the TRNG.

#### Flexible hardware accelerators for biocomputing applications

Participants : Steven Derrien, Naeem Abbas, Patrice Quinton.

It is widely acknowledged that FPGA-based hardware acceleration of compute intensive bioinformatics applications can be a viable alternative to cluster (or grid) based approach as they offer very interesting MIPS/watt figure of merits. One of the issues with this technology is that it remains somewhat difficult to use and to maintain (one is rather designing a circuit rather than programming a machine).

Even though there exists C-to-hardware compilation tools (Catapult-C, Impulse-C, etc.), a common belief is that they do not generally offer good enough performance to justify the use of such reconfigurable technology. As a matter of fact, successful hardware implementations of bio-computing algorithms are manually designed at RTL level and are usually targeted to a specific system, with little if any performance portability among reconfigurable platforms.

This research work, which is part of the ANR BioWic project, aims at providing a framework for helping semi-automatic generation of high-performance hardware accelerators. In particular we expect to widen the scope of common design constraints by focusing on system-level criterions that involve both the host machine and the accelerator (workload balancing, communications and data reuse optimisations, harwdare utilization rate, etc.). This research work builds upon the Cairn research group expertise on automatic parallelization for application specific hardware accelerators and has been targeting mainstream bioinfiormatic applications (HMMer, ClustalW and BLAST).

Our work in 2010 focused on the HMMER algorithm, and led to a very fruitful
collaboration with Prof Rajopadhye at CSU. In particular we have proposed a
mathematical reformulation of the HMMER algorithm (previously known to be
sequential) that exposes parallelism in the form of *reductions* and
*prefix-scan* operations, that are very well suited to efficient
hardware implementation [37] .

#### Parallel reconfigurable architectures for LDPC decoding

Participants : Florent Berthelot, François Charot, Charles Wagner, Christophe Wolinski.

LDPC codes are a class of error-correcting code introduced by Gallager with an iterative probability-based decoding algorithm. Their performances combined with their relatively simple decoding algorithm make these codes very attractive for the next satellite and radio digital transmission system generations. LDPC codes were chosen in DVB-S2, 802.11n, 802.16e, 802.3an and CCSDS standards. The major problem is the huge design space composed of many interrelated parameters which enforces drastic design trade-offs. Another important issue is the need for flexibility of the hardware solutions which have to be able to support all the declinations of a given standard.

In the context of the RPS2 project, we have designed a partly parallel architecture suited to the decoding of LDPC codes for the digital video broadcast DVB-S2 standard [41] . A complete development flow starting from Matlab specification downto backend tools dedicated to FPGA implementation has been defined. Algorithm analysis and bit error performance evaluation have been performed using the open source IS-CML Matlab toolbox. Firstly a functional DVB-S2 decoding algorithm based on the iterative belielf propagation algorithm has been written in C/C++. Floating point to fixed point conversion has been studied. Then the functional model has been rewritten in SystemC with the goal to match the defined architecture at a cycle accurate bit accurate level using SystemC synchronous threads. Finally an iterative transformation to VHDL code of each SystemC thread has been realized.

This flow allowed a better understanding of the algorithm in terms of complexity, performance and its hardware implementation. We focused on complexity-performance trade-offs due to message quantizations and we compared its effects for several algorithmic approximations used for the processing of check nodes. The decoder has been implemented on a XD2000i FPGA in-socket accelerator from XtremeData – a platform composed of a stratix 3 FPGA from Altera plugged in a CPU-socket. The Matlab-based simulation acceleration allowed quantization effect study and error floor effect at very low BER for DVB-S2 check node algorithm approximations.