Section: Scientific Foundations
Category-level object and scene recognition
The objective in this core part of our research is to learn and recognize quickly and accurately thousands of visual categories, including materials, objects, scenes, and broad classes of temporal events, such as patterns of human activities in picnics, conversations, etc. The current paradigm in the vision community is to model/learn one object category (read 2D aspect) at a time. If we are to achieve our goal, we have to break away from this paradigm, and develop models that account for the tremendous variability in object and scene appearance due to texture, material, viewpoint, and illumination changes within each object category, as well as the complex and evolving relationships between scene elements during the course of normal human activities.
Learning image and object models.
Learning sparse representations of images has been the topic of much recent research. It has been used for instance for image restoration (e.g., Mairal et al., 2007) and it has been generalized to discriminative image understanding tasks such as texture segmentation, category-level edge selection and image classification (Mairal et al., 2008). As discussed in Section 6.5.1 , we have developed fast and scalable optimization methods for learning the sparse image representations  ,  and developed a software called SPAMS (SPArse Modelling Software) presented in Section 5.1 . We have also unified this framework for image with the so-called non-local means approach, which exploits image self-similarities, leading to state-of-the-art results for image denoising and image demosaicking  . We present this work in Section 6.3.1 .
Category-level object/scene recognition and segmentation
Another signiÞcant strand of our research has focused on the extremely challenging goals of category-level object/scene recognition and segmentation. Towards these goals, we have developed new models or algorithms for (i) reasoning about object relationships in an image, (ii) scene segmentation based on data-driven boundary detection, (iii) learning mid-level features for recognition and (iv) learning object/part models from weakly or ambiguously annotated images.