Section: New Results
Image sequence processing and modeling
Patch-based redundancy analysis for change detection in an image pair
Participants : Charles Kervrann, Patrick Pérez.
[In collaboration with J. Boulanger (RICAM, Austria), J. Salamero (Curie Institute)]
To develop better change detection algorithms, new models able to capture all the spatio-temporal regularities and geometries seen in an image pair are needed. In contrast to the usual pixel-wise methods, a recent line of work consists in modeling semi-local interactions from image patches. Therefore, we proposed also a patch-based formulation for detecting occlusions and other local or regional changes in an image pair. The redundancy property observed in similar images is exploited to detect unusual spatio-temporal patterns in the scene. By introducing scores to compare patches and false alarm rates, a detection algorithm can be derived for dynamic scene analysis with no optical flow computation. From binary local decisions, we propose a collaborative decision rule that uses the total number of detections made by individual neighboring pixels. Our patch-based approach is robust to many types of variations, such as local appearance change, motion and scale variation. Experimental results on several applications including background subtraction, defect detection in video inspection of manufactured objects or detection of changes in satellite images, demonstrate that the method performs well at detecting occlusions, meaningful regional changes and space-time corners, and is especially robust in the case of low signal-to-noise ratios.
Detection and segmentation of moving objects in complex scenes
Participants : Guillaume Neveu, Florent Dutrech, Patrick Pérez.
[In collaboration with A. Bugeau (UPF, Barcelona)]
Detecting individual moving objects in videos that are shot by either still or mobile cameras is an old problem, which is routinely addressed in a number of real applications such as tele-surveillance. There are, however, a number of applicative contexts where this motion analysis problem is not satisfactorily handled by existing techniques. In the context of activity analysis in dynamically cluttered environments (dynamic background, crowded scenes, etc.) for instance, the problem is the one of separating out foreground moving objects of interest from other uninteresting moving objects in the background.
We have proposed a completely automatic system to address this difficult task. It involves three main steps. First, a set of moving points is selected within a sub-grid of image pixels. A multi-cue descriptor is associated to each of these points. Clusters of points are then formed using a variable bandwidth mean shift technique with automatic bandwidth selection. Finally, segmentation of the object associated to a given cluster is performed using graph cuts. Experiments and comparisons to other motion detection methods on challenging sequences demonstrate the performance of the proposed method for video analysis in complex scenes.
Motion texture tracking with mixed-state Markov chains
Participants : Tomas Crivelli, Patrick Bouthemy.
[In collaboration with B. Cernuschi-Frias (Univ. Bueno Aires), G. Piriou and J.-F. Yao (IRMAR, Univ. Rennes)]
Examples of motion textures are mostly found in natural elements as fire, smoke, water, moving foliage but also in traffic and crowd scenes. Tracking this type of video contents is essential for video surveillance applications. However, standard tracking techniques fail in that motion textures are non-rigid, display highly dynamic contents, with specific statistical properties. The key characteristic of motion textures is that local motion observations depict a mixed-state nature: the null motion value appears as a discrete value with positive probability and the rest follows a continuous distribution. We thus have developed a motion texture tracking algorithm based on two main steps. First, motion values are modeled using mixed-state Markov chains which capture the main statistical (temporal) properties of mixed-state observations with only 13 parameters. This model is initially learned for the tracked content. Second, a motion texture window matching strategy is applied based on the computation of the conditional Kullback-Leibler divergence between mixed-state Markov chains. This permits to address the problem of displacement estimation, Results on complex real sequences of different nature have demonstrated an improved performance against standard methods.
Dynamic remote sensing
Participant : Patrick Pérez.
[In collaboration with E. Mémin and S. Gorthi, Fluminance project-team]
See Fluminance activity report.
Geodesic image and video editing
Participant : Patrick Pérez.
[In collaboration with A. Criminisi, T. Sharp and C. Rother, Microsoft Research Cambridge]
In this work  , a new, unified technique to perform general edge-sensitive editing operations on n-dimensional images and videos efficiently. The first contribution is the introduction of a generalized geodesic distance transform (GGDT), based on soft masks. This provides a unified framework to address several, edgeaware editing operations. Diverse editing tasks such as denoising and non-photorealistic rendering, are all dealt with fundamentally the same, fast algorithm. Second, a new, geodesic, symmetric filter (GSF) is presented which imposes contrast-sensitive spatial smoothness into segmentation and segmentation-based editing tasks (cutout, object highlightening, colorization, panorama stiching). The effect of the filter is controlled by two intuitive, geometric parameters. In contrast to existing techniques, the GSF filter is applied to real-valued pixel likelihoods (soft masks), thanks to GGDTs and it can be used for both interactive and automatic editing tasks. Complex object topologies are dealt with effortlessly. Finally, the parallelism of GGDTs enables us to exploit modern multi-core CPU architectures as well as powerful new GPUs, thus providing great flexibility of implementation and deployment. Our technique operates on both images and videos, and generalizes naturally to n-dimensional data. The proposed algorithm has been validated via quantitative and qualitative comparisons with existing, state of the art approaches. Numerous results on a variety of image and video editing tasks further demonstrate the effectiveness of our method.
Aggregating local descriptors into a compact image representation
Participants : Hervé Jégou, Patrick Pérez.
[In collaboration with M. Douze and C. Schmid, Lear project-team]
We address the problem of image search on a very large scale, where three constraints have to be considered jointly: the accuracy of the search, its efficiency, and the memory usage of the representation. To address this problem, we first propose a simplification of the Fischer Kernel image representation, which is a way of aggregating local image descriptors into a vector of limited dimension. We then present an approach for coding and indexing such vectors that preserves well the accuracy of the vectorial Euclidean comparison. The evaluation shows that our approach significantly outperforms the state-of-the-art: the search accuracy is comparable to the bag-of-features approach for an image representation requiring 20 bytes of memory.