Section: Scientific Foundations
Scientific Foundations
The aim of the Magrit project is to develop vision based methods which allow significant progress of AR technologies in terms of ease of implementation, usability, reliability and robustness in order to widen the current application field of AR and to improve the freedom of the user during applications. Our main research directions concern two crucial issues, camera tracking and scene modeling. Methods are developed with a view to meet the expected robustness and to provide the user with a good perception of the augmented scene.
Camera calibration and registration
Keywords : Registration, viewpoint computation, tracking, augmented reality.
One of the most basic problems currently limiting Augmented Reality applications is the registration problem. The objects in the real and virtual worlds must be properly aligned with respect to each other, or the illusion that the two worlds coexist will be compromised.
As a large number of potential AR applications are interactive, real time pose computation is required. Although the registration problem has received a lot of attention in the computer vision community, the problem of real-time registration is still far from being a solved problem, especially for unstructured environments. Ideally, an AR system should work in all environments, without the need to prepare the scene ahead of time, and the user should walk anywhere he pleases.
For several years, the Magrit project has been aiming at developing on-line and markerless methods for camera pose computation. We have especially proposed a real-time system for camera tracking designed for indoor scenes [1] . The main difficulty with online tracking is to ensure robustness of the process. For off-line processes, robustness is achieved by using spatial and temporal coherence of the considered sequence through move-matching techniques. To get robustness for open-loop systems, we have developed a method which combines the advantage of move-matching methods and model-based methods [6] by using a piecewise-planar model of the environment. This methodology can be used in a wide variety of environments: indoor scenes, urban scenes ... We are also concerned with the development of methods for camera stabilization. Indeed, statistical fluctuations in the viewpoint computations lead to unpleasant jittering or sliding effects, especially when the camera motion is small. We have proved that the use of model selection allows us to noticeably improve the visual impression and to reduce drift over time.
An important way to improve the reliability and the robustness of pose algorithms is to combine the camera with another form of sensor in order to compensate for the shortcomings of each technology. Each technology approach has limitations: on the one hand, rapid head motions cause image features to undergo large motion between frames that can cause visual tracking to fail. On the other hand, inertial sensors response is largely independent from the user's motion but their accuracy is bad and their response is sensitive to metallic objects in the scene. We have proposed a system that makes an inertial sensor cooperate with the camera-based system in order to improve the robustness of the AR system to abrupt motions of the users, especially head motions. This work contributes to reduce the constraints on the users and the need to carefully control the environment during an AR application [1] . This research area has been continued within the ASPI project in order to build a dynamic articulatory model from various image modalities and sensor data.
Obtaining a model of the scene where the AR applications is to take place is often required by pose algorithms. However, obtaining a model either by automatic or interactive means is a tedious task, especially for large environments. In addition, models may be described in terms of 3D features which cannot be identified in the images. Pose by recognition is thus an appealing approach which allows to link photometric knowledge learned on the scene to the camera pose. We are currently considering learning-based techniques, the aim of which is to allow pose computation from video sequences previously acquired on the site where the application is to be used.
Finally, it must be noted that the registration problem must be addressed from the specific point of view of augmented reality: the success and the acceptance of an AR application does not only depend on the accuracy of the pose computation but also on the visual impression of the augmented scene. The search for the best compromise between accuracy and perception is therefore an important issue in this project. This research topic has been addressed in our project both in classical AR [7] and in medical imaging in order to choose the camera model, including intrinsic parameters, which describes at best the considered camera.
Scene modeling
Keywords : Fusion, medical imaging, reconstruction.
Modeling the scene is a fundamental issue in AR for many reasons. First, pose computation algorithms often use a model of the scene or at least some 3D knowledge on the scene. Second, effective AR systems require a model of the scene to support occlusion and to compute light reflexions between the real and the virtual objects. Unlike pose computation which has to be computed in a sequential way, scene modeling can be considered as an off-line or an on-line problem according to the application.
In our past activities, scene modeling was mainly addressed as an off-line and possibly interactive process, especially to build models for medical imaging from several images modalities. Since three years, one of our research directions is about online scene reconstruction, with the aim to be able to handle AR applications in vast environments without the need to instrument the scene.
Interactive scene modeling from various image modalities is mainly considered in our medical activities. For the last 15 years, we have been working in close collaboration with the neuroradiology laboratory (CHU-University Hospital of Nancy) and GE Healthcare. As several imaging modalities are now available in a per-operative context (2D and 3D angiography, MRI, ...), our aim is to develop a multi-modality framework to help therapeutic decision and treatment.
We have mainly been interested in the effective use of a multimodality framework in the treatment of arteriovenous malformations (AVM) and aneurysms in the context of interventional neuroradiology. The goal of interventional gestures is to guide endoscopic tools towards the pathology with the aim to perform embolization of the AVM or to fill the aneurysmal cavity by placing coils. An accurate definition of the target is a parameter of great importance for the success of the treatment. We have proposed and developed multimodality and augmented reality tools which make cooperate various image modalities (2D and 3D angiography, fluoroscopic images, MRI, ...) in order to help physicians in clinical routine. One of the success of this collaboration is the implementation of the concept of augmented fluoroscopy [4] , which helps the surgeon to guide endoscopic tools towards the pathology. Lately, in cooperation with the Alcove EPI, we have proposed new methods for implicit modeling of the aneurysms with the aim to obtain near real time simulation of the coil deployment in the aneurysm [3] . Multi-modality techniques for reconstruction are also considered within the european ASPI project, the aim of which is to build a dynamic model of the vocal tract from various images modalities (MRI, ultrasound, video) and magnetic sensors.
On-line reconstruction of the scene structure needed by pose or occlusion algorithms is highly desirable for numerous AR applications for which instrumentation is not conceivable. Hence, structure and pose must be sequentially estimated over time. This process largely depends on the quality of the matching stage which allows to detect and to match features over the sequence. Ongoing research are thus conducted on the use of probabilistic methods to establish robust correspondences of features over time. The use of a contrario decision is especially under study to achieve this aim [5] .
Most automatic techniques aim at reconstructing a sparse and thus unstructured set of points of the scene. Such models are obviously not appropriate to perform interaction with the scene. In addition, they are incomplete in the sense that they may omit features which are important for the accuracy of the pose recovered from 2D/3D correspondences. We have thus investigated interactive techniques with the aim to obtain reliable and structured models of the scene. The goal of our approach is to develop immersive and intuitive interaction techniques which allow scene modeling during the application [19] .