PDF e-Pub

## Section: New Results

### 3D object and scene modeling, analysis, and retrieval

#### Trinocular Geometry Revisited

Participants : Jean Ponce, Martial Hebert, Matthew Trager.

Figure 1. Left: Visual rays associated with three (correct) correspondences. Right: Degenerate epipolar constraints associated with three coplanar, but non-intersecting rays lying in the trifocal plane.

#### Consistency of silhouettes and their duals

Participants : Matthew Trager, Martial Hebert, Jean Ponce.

Figure 2. Geometrically consistent silhouettes are feasible projections of a single object.

#### Congruences and Concurrent Lines in Multi-View Geometry

Participants : Jean Ponce, Bernd Sturmfels, Matthew Trager.

Figure 3. Non-central panoramic (left) and stereo panoramic cameras (right) are examples of non-linear cameras that can be modeled using line congruences.

#### NetVLAD: CNN architecture for weakly supervised place recognition

Participants : Relja Arandjelović, Petr Gronat, Akihiko Torii, Tomas Pajdla, Josef Sivic.

In [9], we tackle the problem of large scale visual place recognition, where the task is to quickly and accurately recognize the location of a given query photograph. We present the following three principal contributions. First, we develop a convolutional neural network (CNN) architecture that is trainable in an end-to-end manner directly for the place recognition task. The main component of this architecture, NetVLAD, is a new generalized VLAD layer, inspired by the "Vector of Locally Aggregated Descriptors" image representation commonly used in image retrieval. The layer is readily pluggable into any CNN architecture and amenable to training via backpropagation. Second, we develop a training procedure, based on a new weakly supervised ranking loss, to learn parameters of the architecture in an end-to-end manner from images depicting the same places over time downloaded from Google Street View Time Machine. Finally, we show that the proposed architecture obtains a large improvement in performance over non-learnt image representations as well as significantly outperforms off-the-shelf CNN descriptors on two challenging place recognition benchmarks. This work has been published at CVPR 2016 [9]. Figure 4 shows some qualitative results.

Figure 4. Our trained NetVLAD descriptor correctly recognizes the location (b) of the query photograph (a) despite the large amount of clutter (people, cars), changes in viewpoint and completely different illumination (night vs daytime).

#### Pairwise Quantization

Participants : Artem Babenko, Relja Arandjelović, Victor Lempitsky.

##### Learning and Calibrating Per-Location Classifiers for Visual Place Recognition

Participants : Petr Gronat, Josef Sivic, Guillaume Obozinski [ENPC / Inria SIERRA] , Tomáš Pajdla [CTU in Prague] .