Section: New Results
Automatic 3D face expression tracking system
3D Face expression tracking algorithm
Participants : Kalinkina Daria, Andre Gagalowicz.
Our 3D face tracking approach is based upon analysis/synthesis collaboration and uses a textured polygonal 3D model as a tool to detect face position and expression in each frame of the sequence by minimizing the mean-square error between the generated synthetic image of the face and the real one. Since the performance of tracking (precision and stability of detection) relies heavily on the precision of the 3D model used, before starting to track we adapt our deformable generic 3D model to the person in the video.
To do the adaptation we use one or more images of the person taken from different views. Matching is performed in two steps: the first step consists of the adaptation of the predefined characteristic points which is done in parallel with the per-view camera calibration; the second one consists of the adaptation of the contours in all the views.
The flowchart of our face tracking algorithm is presented in FigureĀ 1 . As an input the tracking algorithm uses a customized 3D face model and a sequence of images where the face has to be tracked, and as an output provides the face pose and expression in each frame of this sequence. The algorithm itself is composed of two phases: the initialization step and the optimization step.
Initialization is performed only once for a given sequence and consists of interactive positioning of the 3D model over the first frame of the sequence so that it corresponds exactly to the face image. This allows to map the colored texture of the face onto the 3D model by back-projecting the image onto the model's textured geometry.
The optimization block is the kernel of the system and is performed for each frame of the sequence starting from the second one. It is run in an analysis-by-synthesis iterative loop, where at each time step the model is rendered with different parameters, and the generated image is compared to the current frame image. The matching error is then analyzed and passed back to the synthesizing block as a feedback. The iterative process is driven by a simulated annealing minimization algorithm and terminates, when the optimal system parameters for the current frame are found.
We continued developing our combined facial animation system consisting of MPEG-4 facial animation parameters (FAPs) controlled by Bezier curves in the mouth and eye regions. While in the eye region these curves serve uniquely for corrections of the eyelid borders, in the mouth region they aim at reducing the number of parameters to minimize. As a consequence, we have two cycles of minimization in the mouth region: during the first cycle FAPs displacements are constrained by the curve deformations, and during the second cycle positions of FAPs are optimized directly to achieve better tracking precision.
Acceleration of the tracking algorithm
Being precise and stable our algorithm is however far from real-time, requiring computational time of around 2 minutes per frame due to multiple cycles of minimization (from 5 to 7 depending on the desired precision of tracking in the mouth region) for a single image each containing around 1000 iterations. Several measures were undertaken to increase the computational speed. First of all we performed a deep study of the minimization algorithm, goal of this study being to find an optimal set of parameters for each kind of minimization (related to rigid and facial expression tracking). In particular it relates to the range of search, that should be adapted to the nature of the tracked parameters. For instance, for rigid tracking it can be set by analyzing the pose changes between two consecutive frames during fast motion. Another important issue is the stopping criterion within a simulated annealing minimization loop. It should occur early enough to prevent the algorithm from wasting its time on useless iterations, and at the same time, let the minimization converge to the global minimum. Our criterion stops the iteration loop when the error is reduced by less than 1 percent during last 5 iterations.
In particular, we stop iterating the simulated annealing minimization over the temperature parameter if the difference between the errors related to the current and previous best solutions is less than 1 percent in five consecutive temperature iterations. This allows us to reduce the number of iterations to 500-600 while keeping the same tracking precision.
All these values were evaluated experimentally by studying the evolution of the error inside minimization loops for different kinds of video sequences.
Good initial guess for the minimization can also lead to a faster convergence of the minimization process. Experimentally we deduced that the pose obtained as a result of a linear interpolation from the two previous frames serves as a good prediction and is closer in most cases to the final solution than the tracking result for the previous frame.
All the measures described above are basically related to reduction of the number of iterations inside each minimization loop. However, there exists another way to reduce drastically the computational time for the whole sequence. It consists in skipping the tracking process for certain frames by assigning to them the parameter values obtained by a linear interpolation from the two neighboring frames. Experimentally we found that skipping of every second frame has almost no effect on the resulting quality of tracking in terms of the final per frame error. At the same time, it reduces the overall processing time by the factor of two.