Team Imedia

Overall Objectives
Scientific Foundations
Application Domains
New Results
Other Grants and Activities

Section: New Results

Construction and organisation of the visual feature space

Multi-source web image search results clustering

Keywords : Multi-source, shared neighbors, clustering, web search.

Participants : Amel Hamzaoui, Alexis Joly, Nozha Boujemaa.

Millions of users interact with search engines daily. Most of existing popular search engines allow users to represent their search intents by issuing the query as a list of keywords. However, keywords queries are usually ambiguous. This ambiguity often leads to unsatisfying search results. For example, the query “apple” covers several different topics; fruit, smart phone, computer and so on. Heterogeneous search results need to be combined and structured efficiently and generically.

We propose to use clustering techniques that are using only ranked nearest neighbours information (and not directly features or similarity measures). Such method has been proved to be very interesting.

We are notably using some a contrario principle to normalize connexity information.

The goal is to easy fuse different sources of information without any learning or prior knowledge and to produce either mono or multi source clusters in the same clustering results.

The first step is to consider all objects as candidate cluster centers and to compute a significance score for each center with his nearest neighbours including an oracle selection step to decide which modalities are more significant for each candidate cluster.

Because this step is time consuming, we construct a fast shared neighbour's intersection matrix for each modality at the beginning of the process.

This optimization accelerates our algorithm so that a user can quickly get an overview of the different clusters with the mention of the modalities used.

We experimented our approach on the Exalead Corpus and we found very interesting multi-source clusters for different queries.

We plan to evaluate our work in the scope of re-ranking rather than clustering since there is not an evaluation dataset for web search clustering.

For now, the different information sources that we use are mostly visual ones (Bof, Global features, etc). We would like to test our fusion (re-ranking /clustering) algorithm on different modalities and see how we perform compared to state-of-the art.

An example of a structured result for the query “Flag” is shown in Figure 1 (see also hamzaoui/InterfaceExa.html ).

Figure 1. An Example of web search clustering for the query "Flag"

3D indexing

Keywords : 3D alignment, 3D model retrieval, Principal Component Analysis, symmetry detection, choice of the optimal pose.

Participants : Mohamed Chaouch, Anne Verroust-Blondet, Skander El Fekih.

This year, we pursued our work on 3D model retrieval and indexing in several directions.

Figure 2. 3DGA computed on four different models. The diameter of the red balls and the local description of 3DGA are proportional.

A new global descriptor, called 3D gaussian descriptor (3DGA) , derived from the Gauss transform has been proposed in [16] and [7] . It consists in a spatial description of the model built from the Gaussian law and obtained by a summation on the surface of the model (see figure 2 ). The 3DGA descriptor is efficient but less effective than our 2D/3D descriptors for the generic models. Nevertheless, it may be useful to describe the 3D model having an important part of its surface hidden when computing its 2D projections.

Our 3D alignment method [9] , [7] has shown again its good performances. Indeed, our alignment method, coupled with the MDLA [37] descriptor (AL-MDLA) won the generic track of the SHREC 2009 contest [18] (see figure 3 ).

Figure 3. Retrieval result of a desk chair model query (the upper leftmost object) using AL-MDLA on the SHREC 2009 generic database: 18 of the 19 remaining desk chair models are retrieved among the first 20 results.

Moreover, this result has been reinforced by the detailed evaluations made by Mohamed Chaouch in his thesis [7] on the main 3D generic shapes databases: once again the AL-MDLA approach obtained the best retrieval performances in all the cases. These results confirmed the importance of an appropriate choice of a 3D alignment method during the normalisation step of the retrieval process and the effectiveness of our 2D/3D descriptor when retrieving 3D models inside a database of 3D generic models.

Our alignment work has also been extended to reduce the number of reference frames that can be associated to a 3D model to find its natural pose among the 48 coordinate systems associated to the alignment axes. The principle of the extension is detailed in [9] and in [7] . It is based on observations of human perception w.r.t. the vertical symmetries of the models.

Figure 4. Reference frames associated to the models according to their symmetry properties

An interactive tool has been developed by Skander El Fekih during his master's thesis [29] . Figure 4 shows examples of reduced sets of models reference frames proposed to the user by the tool.

Alignment of 2D objects

Keywords : Alignment of 2D Objects, Principal Component Analysis, Symmetry Detection.

Participants : Olfa Mzoughi, Itheri Yahaioui, Nozha Boujemaa.

The main difficulty in 2D shape recognition is that shapes of objects can vary within the same semantic class. These variations, called deformations, can be due to multiple reasons: the objects may be viewed from different perspectives, the objects may be structurally different (in the case of articulated and deformable objects), or objects may have a different scale. In general, a normalization step to achieve invariance under all possible deformations is required before the recognition process. The normalization consists of three steps. The first step centers the objects to achieve translation invariance. The second step normalizes the scale of the objects. The third step aligns the objects to achieve rotation invariance. Most existing normalization methods are efficient solutions for centering and scaling. However, alignment remains unsolved.

Humans achieve this task efficiently by placing objects in the way that they are most commonly seen in their surroundings. Finding a technique that simulates this behavior is challenging. Results from psychological tests on human perception and recent 3D alignment methods show that symmetry is an important factor that contributes to such intuitive alignment. Based on this, we propose a new approach to automatically align 2D shape in an intuitive way. Inspired by an idea related to 3D alignment [9] , this approach is based on two types of symmetry: the reflective symmetry and the local translational symmetry. The reflective symmetry is used as a criterion to validate the principal component analysis (PCA) alignment.

In case the PCA alignment is rejected, an alternative technique is proposed, which is based on the local translational symmetry. This is defined as the repetition of the same geometrical properties along a given direction. In our algorithm, we used two representations of shape: its boundary and its surface. We show that the surface representation, which takes into account all points of the shape, often works better than the boundary representation. It can be argued that points on periphery are more sensitive to deformations. In general, compared to other alignment approaches, our method computes rapidly and efficiently intuitive alignments, such as the ones presented in figure 5 .

Figure 5. Alignment results of different objects.

Grape leaves segmentation

Keywords : computational botany, segmentation, mathematical morphology.

Participants : Sofiène Mouine, Raffi Enficiaud, Nozha Boujemaa, Ezzedine Zagrouba.

In the scope of the Pl@ntNet project, we are working on plant identification. Previous work on Orchidae of Laos showed precise identification by the use of images of their leaves. In this case, the leaves were scanned and appropriately cropped in order to retain only the relevant information. We are now extending this preliminary work onto the grapes identification, first by the use of a regular digital camera and second by evaluating several shooting protocols. The latters aim at being more realistic against the working conditions in the field.

Contour-based shape descriptors, such as the one presented in [50] , have interesting discriminative properties and should address all these previous issues. Before being able to describe the regions of interest, a segmentation should be performed beforehand. The segmentation algorithm should ideally be working with a few and yet intuitive parameters, and should be fast. The original watershed transform [34] along with some of its improvements and extensions [33] , [40] were interesting candidates for this task.

Our work consisted in implementing and evaluating the original watershed on images under varying shooting conditions. We first focused on images with relatively homogeneous background, with either controlled or uncontrolled illuminating conditions. In the semi-supervised version of the watershed, an image marking the inside of each interest region is needed. We postponed the automatic choice of the markers to a future work, and used manually placed markers.

The details of this work are presented in [31] and an example of segmentation is shown in figure 6 .

Figure 6. Watershed transform on images of leaves. Left: original image, middle: marker image, right: result of the watershed transform.

These results mainly show that the watershed transform is able to address the extraction of regions of interest. Some work should be done in order to address less controlled shooting conditions and automatic processing.

The extension of this work are threefold. First, we are investigating automatic markers placement. The visual cues on which we lead our work are the colour and the vein network. Indeed, the vein network for the grape families is almost always visible. Second, the segmentation should be robust to varying illuminating conditions and particularly to shadows. We propose to enhance the currently used gradients for that purpose. Finally, partial image description inside the regions should make the final identification's step robust to frequently occurring occlusions. Finally, we also would like to extend these investigation to flower segmentation.

High resolution satellite image classification by using multi-cue combination and Discriminative Random Field framework

Keywords : high-resolution satellite images, classification, homogenous/non-homogenous DRF model, multi-cue combination, contextual interactions.

Participants : Olfa Besbes, Nozha Boujemaa, Ziad Belhadj [ SUP'COM - Tunisia ] .

In recent years the resolution of images that are obtained from satellites has increased significantly to reach nowadays 41 cm/pixel in the panchromatic band with GeoEye-1 sensor. Consequently, new challenges arise for an accurate land-cover interpretation of greatly spectral and spatial heterogeneous data. Because of this heterogeneity, satellite images are ambiguous and their classification remains a difficult task despite many thoughtful attempts. Indeed, most existing classification methods are only suitable to a specific range of resolution and on the whole they fail as the resolution is high. In order to overcome this shortcoming, we proceed in [14] , [15] as follows: Fist, we perform a multi-cue combination by incorporating various features such as color, texture and edge in a single unified discriminative model. Given a high resolution satellite image database, we learn an appropriate dictionary which consists of cue meaningful clusters namely color clusters, textons and shapemes. Second, we adopt a probabilistic modeling approach to resolve uncertainties and intra-region variabilities as well as to enforce global labeling consistency. In fact, we define a Discriminative Random Field (DRF) [45] model on an adjacency graph of superpixels which focuses directly on the conditional distribution Im1 ${p\mfenced o=( c=) L\mfenced o=|  X,\#952 }$ of labels L given the image observations X and the learned parameters $ \theta$ . Our DRF model captures similarity, proximity and familiar configuration so that a powerful discrimination is ensured. In order to capture contextual interactions of the labels as well as the data, we define in [14] a non-homogeneous discriminative model with spatially dependent association and pairwise potentials. Third, we take a feature selection approach based on sharing boosting [49] to learn efficiently the feature functions and to discriminate powerfully the regions of interest though the content complexity. Finally, we apply a cluster sampling algorithm [32] , which combines the representational advantages of DRF and graph cut approaches, to infer the global optimal labeling.

We train and test our model on high resolution SPOT-5 satellite images. Our method is suitable to any range of resolution since we need just to perform training in the appropriate database. Promising results are obtained as shown in figures 7 and 8 .

Figure 7. Example results on high resolution SPOT-5 satellite images. (a,d) Original test multi-spectral images. The inferred color-coded output object-class maps obtained by homogeneous (b,e) and non-homogeneous (c,f) versions of our DRF model.
Figure 8. Example results on high resolution SPOT-5 satellite images. (a,d) Original test multi-spectral images. (b,e) The inferred urban area boundaries. (c,f) the binary classification output maps obtained by our DRF model.

The non-homogeneous DRF model provides better results than the homogeneous DRF model which demonstrates the importance of contextual information integration. In figure 8 , we illustrate results obtained by our homogeneous DRF model for urban area extraction. In future work, we plan to learn the weighting parameters of potentials and extend our model to a multi-scale framework.

Texture Based Satellite Image indexing with Local Binary Pattern Correlograms

Keywords : Multispectral satellite image, Textures, Interest points, Local Binary Pattern, Correlograms.

Participants : Sahbi Bahroun, Nozha Boujemaa, Ziad Belhadj [ SUP'COM - Tunisia ] .

Description and recognition of textures in satellite images has attracted growing attention in recent years. In [13] a novel approach for retrieval of textures based on a novel type of image representation is presented: the Local Binary Pattern Correlograms (LBPCs). Our representation is obtained by first performing an extraction of the most informative points in the image. Then, we compute local binary patterns around these interest points. Furthermore, we propose a novel texture feature by computing the correlogram of the LBP computed around the interest points. Our new LBPCs combine the potential of local and global descriptors. Local descriptors, represented by local features extracted around interest points, are characterized by their robustness to occlusions, scale and geometric transformations. Global descriptors, represented by correlograms, are very informative about the overall visual structure of an object. The LBP occurrence correlogram is proved to be a very powerful texture feature. Our proposed LBP Correlograms has been tested on a real SPOT image database. The experimental result shows good average retrieval accuracy. Excellent results are achieved compared against some state of the art methods.

Figure 9. Comparison of the precision recall curves of the 4 methods that we tested with our new LBPCs

In Figure 9 , the precision recall curve of our proposed approach (LBPCs) is compared with the curve of the other approaches: (I) LBPCs combining the monochrome and opponent LBP and with three set neighborhood, (II) LBPCs with only one set neighborhood, (III) MLBP Histogram [7] and (IV) traditional Correlograms. It is clearly showed that the performance of our method is better than others. There is not much different in accuracy between our proposed LBPCs and LBPCs combining monochrome and opponent LBP. Our method is faster and with a smaller memory size to store index.


Logo Inria