Section: Scientific Foundations
Object tracking with nonlinear probabilistic filtering
Tracking problems that arise in target motion analysis ( tma ) and video analysis are highly nonlinear and multimodal, which precludes the use of Kalman filter and its classic variants. A powerful way to address this class of difficult filtering problems has become increasingly successful in the last ten years. It relies on sequential Monte Carlo ( smc ) approximations and on importance sampling. The resulting samplebased filters, also called particle filters, can, in theory, accommodate any kind of dynamical models and observation models, and permit an efficient tracking even in high dimensional state spaces. In practice, there is however a number of issues to address when it comes to difficult tracking problems such as longterm visual tracking under drastic appearance changes, or multiobject tracking.
The detection and tracking of single or multiple targets is a problem that arises in a wide variety of contexts. Examples include sonar or radar tma and visual tracking of objects in videos for a number of applications (e.g., visual servoing, telesurveillance, video editing, annotation and search). The most commonly used framework for tracking is that of Bayesian sequential estimation. This framework is probabilistic in nature, and thus facilitates the modeling of uncertainties due to inaccurate models, sensor errors, environmental noise, etc. The general recursions update the posterior distribution of the target state , also known as the filtering distribution, where denotes all the observations up to the current time step, through two stages:
where the prediction step follows from marginalization, and the new filtering distribution is obtained through a direct application of Bayes' rule. The recursion requires the specification of a dynamic model describing the state evolution , and a model for the state likelihood in the light of the current measurements . The recursion is initialized with some distribution for the initial state . Once the sequence of filtering distributions is known, point estimates of the state can be obtained according to any appropriate loss function, leading to, e.g., Maximum A Posteriori (map ) and Minimum Mean Square Error (mmse ) estimates.
The tracking recursion yields closedform expressions in only a small number of cases. The most wellknown of these is the Kalman Filter (kf ) for linear and Gaussian dynamic and likelihood models. For general nonlinear and nonGaussian models the tracking recursion becomes analytically intractable, and approximation techniques are required. Sequential Monte Carlo (smc ) methods [46] , [52] , [51] , otherwise known as particle filters, have gained a lot of popularity in recent years as a numerical approximation strategy to compute the tracking recursion for complex models. This is due to their efficiency, simplicity, flexibility, ease of implementation, and modeling success over a wide range of challenging applications.
The basic idea behind particle filters is very simple. Starting with a weighted set of samples approximately distributed according to , new samples are generated from a suitably designed proposal distribution, which may depend on the old state and the new measurements, i.e., , . Importance sampling theory indicates that a consistent sample is maintained by setting the new importance weights to
where the proportionality is up to a normalizing constant. The new particle set is then approximately distributed according to . Approximations to the desired point estimates can then be obtained by Monte Carlo techniques. From time to time it is necessary to resample the particles to avoid degeneracy of the importance weights. The resampling procedure essentially multiplies particles with high importance weights, and discards those with low importance weights.
In many applications, the filtering distribution is highly nonlinear and multimodal due to the way the data relate to the hidden state through the observation model. Indeed, at the heart of these models usually lies a data association component that specifies which part, if any, of the whole current data set is “explained” by the hidden state. This association can be implicit, like in many instances of visual tracking where the state specifies a region of the image plane. The data, e.g., raw color values or more elaborate descriptors, associated to this region only are then explained by the appearance model of the tracked entity. In case measurements are the sparse outputs of some detectors, as with edgels in images or bearings in tma , associations variables are added to the state space, whose role is to specify which datum relates to which target (or clutter).
In this large context of smc tracking techniques, two sets of important open problems are of particular interest for Vista:

selection and online estimation of observation models with multiple data modalities: except in cases where detailed prior is available on state dynamics (e.g., in a number of tma applications), the observation model is the most crucial modeling component. A sophisticated filtering machinery will not be able to compensate for a weak observation model (insufficiently discriminant and/or insufficiently complete). In most adverse situations, a combination of different data modalities is necessary. Such a fusion is naturally allowed by smc , which can accommodate any kind of data model. However, there is no general means to select the best combination of features, and, even more importantly, to adapt online the parameters of the observation models associated to these features. The first problem is a difficult instance of discriminative learning with heterogeneous inputs. The second problem is one of online parameter estimation, with the additional difficulty that the estimation should be mobilized only parsimoniously in time, at instants that must be automatically determined (adaptation when the entities are momentarily invisible or simply not detected by the sensors will always cause losses of track). These problems of feature selection, online model estimation, and data fusion, have started to receive a great deal of attention in the visual tracking community, but proposed tools remain adhoc and restricted to specific cases.

multipleobject tracking with data association: when tracking jointly multiple objects, data association rapidly poses combinatorial problem. Indeed, the observation model takes the form of a mixture with a large number components indexed by the set of all admissible associations (whose enumeration can be very expensive). Alternatively, the association variables can be incorporated within the state space, instead of being marginalized out. In this case, the observation model takes a simpler product form, but at the expense of a dramatic dimension increase of the space in which the estimation must be conducted.
In any case, strategies have thus to be designed to keep low the complexity of the multiobject tracking procedure. This need is especially acute when smc techniques, already often expensive for a single object, are required. One class of approach consists in devising efficient variants of particle filters in the highdimensional product state space of joint target hypotheses. Efficiency can be achieved, to some extent, by designing layered proposal distributions in the compound targetassociation state space, or by marginalizing out approximately the association variables. Another set of approaches lies in a crude, yet very effective approximation of the joint posterior over the product state space into a product of individual posteriors, one per object. This principle, stemming from the popular jpdaf (joint probabilistic data association filter) of the trajectography community, is amenable to smc approximation. The respective merits of these different approaches are still partly unclear, and are likely to vary dramatically from one context to another. Thorough comparisons and continued investigation of new alternatives are still necessary.