Section: New Results
Tracking Focus of Attention
Participants : Nicolas Gourier, Jérôme Maisonnasse, James Crowley [ correspondant ] .
Project PRIMA has developed a method for estimating the head orientation of previously unseen subjects from images obtained under natural, unconstrained conditions in real time. This method uyses a three-stage approach in which global appearance is first used to provide a low-resolution, coarse estimate of orientation. This coarse estimate is then used as the starting point for a higher-resolution, refined estimate based on local appearance. The high resolution estimation is then used to drive an attentional model based on models of human to human interaction. When applied to Pointing'04 benchmark, this method provides an accuracy of 10o in yaw (pan) angle and 12o in pitch (tilt) angle.
Knowing the head orientation of a person provides information about visual focus of attention. The task of estimating and tracking focus of attention can serve as an important component for systems for man-machine interaction, video conferencing, lecture recording, driver monitoring, video surveillance and meeting analysis. To be useful, such applications require a method that is unobtrusive to avoid distraction. In general, this means estimating orientation of arbitrary subjects from a relatively low resolution imagette, extracted from an image taken from an unconstrained viewing angle under unconstrained illumination. This problem is more difficult than estimating face orientation from high-resolution mug-shot images.
In our system we use a robust video rate face tracker to focus processing on face regions, although any reliable face detection process. Our tracker uses pixel level detection of skin colored regions based on probability density function of chrominance, and provides estimates of the first and second moments of the probability image of skin. From these, we compute an affine transformation that is used to warp the face onto a standard size imagette, while normalising position, width, height and orientation. Experiments have shown imagettes of size 23x30 pixels provide reasonably good input for head pose estimation.
In 2007, software based on this system system has been licensed to the startup company TechnoSens under the names SuiviDeCiblesCouleur and FaceStabilsationSystem. These systems work together to provide automatic video composition for hands-free video communications.
SuiviDeCiblesCouleur locates individuals in a scene for video communications. FaceStabilsationSystem renormalises the position and scale of images to provide a stabilised video stream. SuiviDeCiblesCouleur has been declared with the APP "Agence pour la Protection des Programmes" under the Interdeposit Digital number IDDN.FR.001.370003.000.S.P.2007.000.21000.