Overall Objectives
Scientific Foundations
Application Domains
New Results
Contracts and Grants with Industry
Other Grants and Activities

Section: New Results

High-fidelity image- and video-based modeling

What is a camera? (J.Ponce, joint work with Guillaume Batog and Xavier Goaoc, VEGAS project-team)

We address in [41] the problem of characterizing a general class of cameras under reasonable, “linear” assumptions (Figure 1 ). Concretely, we use the formalism and terminology of classical projective geometry to model cameras by two-parameter linear families of straight lines—that is reguli (rank-3 families) and linear congruences (rank-4 families). This model captures both the general linear cameras of Yu and McMillan and the linear oblique cameras of Pajdla. From a geometric perspective, it affords a simple classification of all possible camera configurations. From an analytical viewpoint, it also provides a simple and unified methodology for deriving general formulas for projection and inverse projection, triangulation, and binocular and trinocular geometry.

Figure 1. A pinhole camera can be thought of as a device that associates with any point Im1 $\#119857 $ the ray $ \xi$ that joins it to its image and passes through the pinhole Im2 $\#119836 $ . This ray is picked from the bundle of lines passing through Im2 $\#119836 $ . More generally, a (non-central) camera can be modeled as a device that picks a line from a linear “bag of lines”—that is, a regulus of a linear congruence.

In [23] , we extend this approach by presenting a complete analytical characterization of linear cameras: Pajdla has shown that a subset of these, the oblique cameras, can be modelled by a certain type of linear map. We have obtained a full tabulation of all admissible maps that induce cameras in the general sense of Grossberg and Nayar, and shown that these cameras are exactly the linear ones. Combining these two models with a new notion of intrinsic parameters and normalized coordinates for linear cameras allows us to give simplified analytical formulas for direct and inverse projections. We also show that the epipolar geometry of any two linear cameras can be characterized by a fundamental matrix whose size is at most 6×6 when the cameras are uncalibrated, or by an essential matrix of size at most 4×4 when their internal parameters are known. Similar results hold for trinocular constraints.

Quantitative image analysis for archeology (B. Russell, J. Ponce, joint work with H. Dessales, ENS Archeology laboratory)

Accurate indexing and alignment of images is an important problem in computer vision. A successful system would allow a user to retrieve images with similar content to a query image, along with any information associated with the image. Prior work has mostly focused on techniques to index and match photographs depicting particular instances of objects or scenes (e.g. famous landmarks, commercial product labels, etc.). This has allowed progress on tasks, such as the recovery of a 3D reconstruction of the depicted scene.

However, there are many types of images that cannot be accurately aligned. For instance, for many locations there are drawings and paintings made by artists that depict the scene. Matching and aligning photographs, paintings, and drawings is extremely difficult due to various distortions that can arise. Examples include perspective and caricature distortions, along with errors that arise due to the difficulty of drawing a scene by hand.

In this project, we seek to index and align a database of images, paintings, and drawings. The focus of our work is the Championnet house in the Roman ruins at Pompeii, Italy. Given an alignment of the images, paintings, and drawings, we wish to explore tasks that are of interest to archaeologists and curators who wish to study and preserve the site. Example applications include: (i) digitally restoring paintings on walls where the paintings have disappeared over time due to erosion, (ii) geometrically reasoning about the site over time through the drawings, (iii) indexing and searching patterns that exist throughout the site.

To date, we have visited the site in Pompeii and photographed the rooms of interest. An initial dense 3D reconstruction has been achieved from 585 photographs using existing photometric multi-view stereo methods. Figure 2 shows a snapshot of a 3D reconstruction of one of the rooms of interest. Notice that the 3D reconstruction captures much detail of the walls and structures.

We are currently exploring different techniques to align the photographs, paintings, and drawings. We hope to submit results from our research to a conference in Spring 2010.

Figure 2. An initial dense 3D reconstruction of a room from the Championnet house in the Roman ruins at Pompeii, Italy. The reconstruction was computed from 585 photographs using existing photometric multi-view stereo methods. Notice that the reconstruction captures much detail of the walls and structures.

Dense 3D motion capture for human faces (J. Ponce, joint work with Y. Furukawa, University of Washington)

We have proposed in [30] a novel approach to motion capture from multiple, synchronized video streams, specifically aimed at recording dense and accurate models of the structure and motion of highly deformable surfaces such as skin, that stretches, shrinks, and shears in the of midst of normal facial expressions. Solving this problem is a key step toward effective performance capture for the entertainment industry, but progress so far has been hampered by the lack of appropriate local motion and smoothness models. The main technical contribution of this paper is a novel approach to regularization adapted to nonrigid tangential deformations. Concretely, we first estimate undergoing nonrigid tangential surface deformation at each vertex of a surface mesh, then aggregate the estimated deformation parameters over the surface for robustness. The estimated deformation parameters are then used in regularizing the (tangential) motion information.

Figure 3. Facial motion capture [30] , featuring shaded renderings of reconstructions obtained from two different frames, the corresponding dense motion fields, and one texture-mapped rendering (the actress's face was covered with make-up to provide additional texture). See for videos. Data courtesy of Image Movers Digital.

Webcam Clip Art (A. Efros, joint work with J.-F. Lalonde, and S. G. Narasimhan, CMU)

Webcams placed all over the world observe and record the visual appearance of a variety of outdoor scenes over long periods of time. The recorded time-lapse image sequences cover a wide range of illumination and weather conditions – a vast untapped resource for creating visual realism. In this work, we propose to use a large repository of webcams as a “clip art” library from which users may transfer scene appearance (objects, scene backdrops, outdoor illumination) into their own time-lapse sequences or even single photographs. The goal is to combine the recent ideas from data-driven appearance transfer techniques with a general and theoretically-grounded physically-based illumination model. To accomplish this, the paper presents three main research contributions: 1) a new, high-quality outdoor webcam database that has been calibrated radiometrically and geometrically; 2) a novel approach for matching illuminations across different scenes based on the estimation of the properties of natural illuminants (sun, sky, weather and clouds), the camera geometry, and illumination-dependent scene features; 3) a new algorithm for generating physically plausible high dynamic range environment maps for each frame in a webcam sequence.


Logo Inria