Overall Objectives
Scientific Foundations
Application Domains
New Results
Contracts and Grants with Industry

Section: New Results

Sparse representation, compression and interaction with indexing

Sparse signal representation algorithms

Participants : Jean-Jacques Fuchs, Cedric Herzet, Gagan Rath.

Sparse representations and compressed sensing continue to be fashionable domains in the signal processing community and numerous pre-existing areas are now labeled this way. Sparse approximation methods aim at finding representations of a signal with a small number of components taken from an overcomplete dictionary of elementary functions. The problem basically involves solving an under-determined system of equations with the constraint that the solution vector has the minimum number of non-zero elements. Except for the exhaustive combinatorial approach, there is no known method to find the exact solution under general conditions on the dictionary. Among the various algorithms that find approximate solutions, pursuit algorithms (matching pursuit, orthogonal matching pursuit or basis pursuit) are the most well-known.

In 2009, we have continued to apply our research in this area to the estimation-detection context [25] and further develop fast-algorithms for criterion [24] , [42] that are distinct from the ubiquitous $ \ell$2 penalized $ \ell$1 criterium. Using the second order cone programming algorithm, we apply sparse representation techniques to complex data in the array pattern synthesis problem. It allows to extend min-max techniques to arbitrary array geometries as well as the sparsity of the number of radiating elements required to achieve a given pattern. This last feature is often of utmost importance and is generally handled in a combinatorial way.

We have also focussed on the application of statistical tools to the resolution of the sparse representation problem. First, we have shown that the standard sparse representation problem can be regarded as the limit case of a MAP problem involving Bernoulli-Gaussian (BG) variables. This connexion gives new insights into existing Bayesian algorithms and paves the way for the design of new ones. In particular, we have proposed new sparse representation algorithms based on a mean-field relaxation of the BG MAP problem. These algorithms are shown to give the best performance among several algorithms available in the literature while having the same complexity order.

Anisotropic basis for image compression

Participants : Angélique Drémeau, Jean-Jacques Fuchs, Christine Guillemot, Cedric Herzet.

Closely related to the sparse representation problem is the design of dictionaries adapted to “sparse” problems. The sparsity of the signal representation indeed depends on how well the bases match with the local signal characteristics. The adaptation of the transform to the image characteristics can mainly be made at two levels: i) in the spatial domain by adapting the support of the transform; ii) in the transformed domain by adapting the atoms of the projection basis to the signal characteristics. In 2009, a first image compression algorithm has been developed based on anisotropic directional DCT (DDCT) bases. A set of image bases is constructed as the concatenation of local anisotropic rectangular DDCT bases. the approach extends the DDCT concept to rectangular bases for which the support is defined using a bintree segmentation. Dynamic programming is then used to select a basis from this set according to a rate-distortion criterion. The bintree segmentation which locally adapts the support and the direction of the transform increases the number of possible image bases, which are then more likely to catch the local properties of the image. In a second approach, the transform basis is selected (in a rate-distortion sense) in a set of bases made up of the concatenation of local multi-scale anisotropic (rectangular) bases. The sets of local bases are optimized in a sparsity-distortion sense. The resulting vectors are then quantized and entropy coded in a jpeg-like manner to assess the rate-distortion performance of the basis.

The problem of dictionnary learning of sets of bases has also been studied. This problem has already been largely addressed in the literature. An existing technique to construct a dictionary made up of the union of orthonormal bases, and based on a classification of the training data into P classes has been revisited. The method has in particular been placed in a probabilistic framework by considering the training data as realizations of a mixture of Gaussians. The learning task is thus reformulated as a MAP estimation problem which is then solved by an EM-algorithm procedure. So far, the training algorithm has been tested only on synthetic data generated according to the same model of mixture of Gaussians as considered in the training algorithm. The validation of the algorithm by taking real images as training data is the next step.

Texture synthesis and prediction based on sparse approximations

Participants : Jean-Jacques Fuchs, Christine Guillemot, Aurélie Martin, Mehmet Turkan.

The problem of texture prediction can be regarded as a problem of texture synthesis. Methods based on sparse approximations, and using orthogonal matching pursuit and basis pursuit algorithms, have been investigated for this texture synthesis and prediction problem. The problem is looked at as a problem of texture synthesis (or inpainting) from noisy observations taken from a causal neighborhood. The goal of sparse approximation techniques is to look for a linear expansion approximating the analyzed signal in terms of functions chosen from a large and redundant set (dictionnary). In the methods developed, the sparse signal approximation is run in a way that allows for the same operation to be done at the decoder, i.e. by taking the previously decoded neighborhood as the known support. The sparse signal approximation is thus run with a set of masked basis functions, the masked samples corresponding to the location of the pixels to be predicted. The decoder proceeds in a similar manner by running the algorithm with the masked basis functions and by taking the previously decoded neighborhood as the known support. In a first step, we have considered dictionaries based on classical waveforms such as DCT and DFT. The approach integrated in an H.264 based encoder has shown gains around 7

Another method of dictionary construction based on texture patches taken in a causal neighborhood of the region to be approximated has been developed. This approach can be regarded as an extension of template matching widely used for image inpainting. Significant spatial prediction gains have been shown compared to static DCT or DFT dictionnaries. Similarly, high temporal prediction gains have been obtained compared to classical block matching techniques. The method has been further improved by locally - and in a rate-distortion sense- adapting the approximation support [38] . These prediction and dictionary construction methods have been validated both for spatial and temporal prediction in the context of the ADT Picovin. Local texture analysis and classification techniques have been developed and assessed in order to better take into account the presence of discontinuities, of edges, and more generally the local texture characteristics. Methods of residue coding based on adaptive rate-distortion optimized directional transforms have also been developed.

Perceptual 2D and 3D video coding

Participants : Josselin Gauthier, Christine Guillemot, Olivier Le Meur.

In collaboration with Zhenzhong Chen from the National Technical University of Singapore, we have pursued a study on video coding exploiting perception and foveation models developed in 2008. Due to the spatial and temporal masking effects, the human visual system has the limitation on the perceptibility of certain levels of noise. Since the human visual system is space-invariant where the fovea has the highest density of sensor cells on the retina, the visual acuity decreases with increased eccentricity relative to the fovea. We have thus worked on the design of a foveated just-noticeable-distortion (JND) model. In contrast to traditional JND methods which exploit the visibility of the minimally perceptible distortion but assume the visual acuity to be consistent over the image, a foveation model is incorporated in the spatial and temporal JND models. The foveated JND model is developed by combining the spatial JND as a function of luminance contrast and spatial masking effect, the temporal JND to model the temporal masking effect, and a foveation model to describe the relationship between the visibility threshold and eccentricity relative to the fovea. Since the perceptual acuity decreases with increased eccentricity, the visibility threshold of the pixel of the image increases when the distance between the pixel and the fixation point increases. The spatio-temporal JND model can thus be improved by accounting for the relationship between visibility and eccentricity. Associated with the proposed foveated JND model, more imperceptible distortion can be tolerated in the contaminated image.

The foveated JND model has been used for H.264/AVC video coding. Bit allocation and rate-distortion optimization are performed according to the foveated JND profile. The regions with higher visibility thresholds are coded with larger quantizers since these regions can tolerant higher distortion. The saved bit rate can be used to improve the quality of the regions which cannot tolerant high distortion. Therefore, the subjective quality of the whole image is improved. The performance of the foveated JND model has been assessed with subjective tests following the corresponding protocols in Rec. ITU-R BT.500. The subjective tests have demonstrated the validity of the FJND model which leads to better perceptual quality of the reconstructed video for the same rate constraints. This study will be pursued and extended to 3D video coding and rendering. For both previously mentionned subjects, it might be useful to consider new low-level and high-level visual features in order to significantly improve the performance. Among these new features, we can mention three of them that it would be interesting to consider first: the depth information, the type of the scene and the visual interest of the different areas of the video (saliency map).

Lossless coding for medical images

Participants : Claude Labit, Jonathan Taquet.

Usual techniques, such as lossless JPEG, LS-JPEG or lossless JPEG2000, are currently proposed and have been compared in a preliminary study (with ETIAM as industrial partner). The first obtained results show the opportunity to launch a new prospective research study in order to

Afterwards, during the project (J. Taquet's thesis), we propose to take into account coding algorithms commonly used in multimedia domain and so, adapt these techniques formerly developped in the lossy compression framework:

A first step of this study explores the improvements that might be expected for volumetric medical images, like computed tomography (CT) or magnetic resonance imaging (IRM). These images are composed by a succession of slices regularly sampled in the 3D space and are stored in sequences of 2D images with 12 to 16 bits per pixel. For diagnosis purpose, the compression has to be performed without or with controlled digital losses. To improve distant image restitution, the data flows should also allow a progressive representation and random access.

Based to a consistent database of several biomedical images, which combines CT and MRI images of various origins, we compare usual lossless (or near lossless) 2D algorithms (CALIC, JPEG-LS, JPEG 2000, SPIHT) with their extensions to 3D sources, in order to evaluate the coding gain by using the third dimension, and the enhancement of sophisticated 3D algorithms in comparison to basic methods. It shows that results were quite variable, depending on how the original image was acquired. The impacts of the noise and of the sampling distance in the third dimension were noticeable. The most noisy images were better compressed with 2D predictive algorithms (CALIC's files size are 3.7

This Ph-D thesis, partially supported by a research grant of Bretagne Council, will take also benefit of the IHE-Europe technical coordination (« Integrating the Heath Care Enterprise ») hosted at Irisa/INRIA Rennes Bretagne-Atlantique research center and of the presence, at Rennes, of several SMEs as industrial partners, such as ETIAM, SME leader in Europe developing innovative tools for multimedia connectivity and medical imaging communication.

Feature extraction for joint compression and local description

Participants : Christine Guillemot, Fethi Smach, Joaquin Zepeda.

The objective of the study initiated in 2007, in collaboration with TEXMEX, is to design signal representation and approximation methods amenable to both image compression (that is with sparseness properties), feature extraction and description. Feature extraction requires the image representation to be covariant under a set of admissible transformations. The Gaussian scale space is thus often used for description, however it is not amenable to compression. One robust (non-sparse) descriptor based on the Gaussian scale space is the SIFT descriptor, usually computed in affinely normalized selected regions. A recent approach referred to as Video Google tackles the high dimensionality problem of this descriptor by forming a single sparse descriptor obtained from multiple input non-sparse SIFT descriptors. The approach consists in vector quantizing each SIFT descriptor on codewords called visual words, and then taking a (weighted) histogram of codeword indices. It allows using the principles of inverted files . Inverted file indices provide a solution to the indexation of high dimensional data (specifically, textual documents) by representing the data as sparse vectors. Document similarity calculations are thus carried out efficiently using the scalar products between these sparse vectors.

We have developed a related approach which constructs a sparse descriptor (called visual sentences), by using a sparse approximation of each SIFT descriptor rather then using a simple vector quantization [39] . The aim is to tackle the problem of SIFT descriptor high dimensionality, while retaining the local property of the input descriptors. The obtained descriptors retain the local characteristic of the input descriptors rather than forming a single global descriptor, while still enabling the use of inverted file type indices which provide a solution to the indexation of high dimensional data by taking advantage of sparse vectors properties. Indeed, document similarity calculations are thus carried out efficiently using the scalar products between these sparse vectors. The approach has been assessed in the context of local querying to the Video Google one, where multiple input SIFT descriptors are agregated into a single sparse descriptor, resulting in the loss of description locality.

However, using sparse vectors instead of original vectors in the computation of the similarity score required in an image retrieval system or in nearest neighbor (NN) search raises a new problem. The residual transformations following the geometrical region normalization cause instabilities in the support (positions of nonzero coefficients) of the sparse vector. The instability problems can severely impact the similarity score between regions and therefore the ranking performance of the NN search task, especially when using the inner-product or correlation as the similarity measure. Inner-products between query and data base sparse vectors may significantly differ from the correlation between the original signal vectors, which we consider as a reference measure in our study. A new method has thus been introduced that makes use of sparse image representations to search for approximate nearest neighbors (ANN) under the normalized inner-product distance. The approach relies on the construction of a new sparse vector designed to approximate the normalized inner-product between underlying signal vectors. The resulting ANN search algorithm shows significant improvement compared to querying with the original sparse vectors, approach considered in the literature for content-based image search. A transform has then been introduced in order to uniformly distribute the input dataset on the unit sphere while preserving relative angular distances. This transformation has been shown to further improve the performance and complexity of the ANN search task.

The problem of dictionary design has then been addressed by developing a novel approach to construct dictonaries in a way that a different dictionary is used at each iteration of the decomposition, and that these iteration tuned dictionaries satisfy some desirable properties. This method and associated algorithm gave promising results and should be validated next year for compression and indexing purposes.

Denoising in the presence of oversampling

Participant : Jean-Jacques Fuchs.

Removing noise from a signal is feasible if some prior information is available. Oversampling is one instance where this is the case. For the noise-free signal, the information contained in the samples is then redundant, there exist implicit relations between the samples. One can thus remove part of the noise by imposing these relations on the noisy samples. Sparsity is another such instance. If one knows that the noise-free signal is sparse in a given non-redundant or redundant basis, then seeking a sparse representation for the signal amounts to remove part of the noise. In case of over-sampling, one can, for instance, project the observation onto the subspace containing "all" the band-limited signals sampled at the same sampling instants, which is lower dimensional only in case of oversampling. This projection matrix is difficult to build and to define, we seek a simple way to obtain it. The research is performed in an underwater-acoustics domain [22] but it is also of interest in a detection context [23] and of course in image processing where it is well documented.

Let x = s + e be the noisy sampled observation vector of dimension N , with s the band-limited sampled signal and e the white noise vector. If P denotes the matrix associated with the aforementioned projection, then Ps = s by definition. The problem to solve is

Im30 ${\munder minPE{{e^TPe}}~~~\mtext under~~~Ps=s~~\mtext with~~P~\mtext in~\mtext the~\mtext set~\mtext of~\mtext projection~\mtext matrices~\mtext on~R^N,}$

and since the expectation of eTPe is r$ \sigma$e2 with r the rank of P , one actually seeks the projection with minimal rank that satisfies Ps = s . The projector can be seen as achieving a low-pass filtering and we show how to use the discrete time Fourier transform to perform it.


Logo Inria