Section: New Results
Scene Understanding for Activity Recognition
Participants : Slawomir Bak, Ikhlef Bechar, Bernard Boulay, François Brémond, Guillaume Charpiat, Duc Phu Chau, Etienne Corvée, Mohammed Bécha Kaâniche, Vincent Martin, Sabine Moisan, Anh-Tuan Nghiem, Guido-Tomas Pusiol, Monique Thonnat, Valery Valentin, Jose-Luis Patino Vilchis, Nadia Zouba.
Introduction
This year, Pulsar has tackled several scene understanding issues and proposed new algorithms in the three following research axes:
-
Perception: people detection and human gesture recognition;
-
Understanding: multi-sensor activity recognition and gesture recognition using learned local motion descriptors;
-
Learning: online and offline trajectory clustering.
A videosurveillance system to quasi real time pest detection and counting (ARC BioSerre)
Participants : Ikhlef Bechar, Vincent Martin, Sabine Moisan.
In the framework of BioSerre project, we investigate a video-surveillance solution for an early detection of pest attacks in the framework of pest management methods. Our system is to be used in a greenhouse endowed with a (Wifi) network of video-cameras. This year we have presented this work in June 2009, in Paris, during the Salon Européen de la Recherche et de l'Innovation(SERI'09), in October 2009, in Bordeaux during the Journées des ARCs de l'INRIA and in December 10th-11th, in Sophia Antipolis for the 2009 INRA-INRIA Seminary.
On top of the classical challenges in video-surveillance (lighting changes, shadows, etc.), we have to face specific challenges:
-
The high resolution of the video frames needed by the application (about 1.3 Mega pixels per frame and about 2 frames every 1.5 sec.), which is necessary to visualize the insects of interest, but constitutes a serious challenge for quasi-real time processing;
-
The very low spatial resolution and color contrast of the harmful insects of interest in the videos;
-
The lack of powerful discriminative features in the insect species of interest, because their low spatial resolution in the videos does not allow us to see their detailed shapes.
The application is divided into different modules. A video acquisition module acquires videos from the remote cameras and stores them locally in a PC where the core of the application is running. Thanks to the trap extraction module , only the region of interest in a video (the trap area) is processed. Then, a background substraction module maintains a statistical model of the background, and detects pixels which deviate significantly from the learned background model. The detected pixels are then processed by an insect presence detection module (IPDM) to decide whether they are likely to be insect or not (e.g. due to illumination changes). A video-frame being divided into many patches (to speed up the subsequent image processing), a mere counting of the pixels that are classified as insect pixels by IPDM allows the system to vote for the patch which will be processed by the next module, insect detection module (IDM). The IDM consists of low level image processing operations (RGB to gray scale image transformation, image convolution, image differentiation, local maxima extraction, perceptual grouping, etc.) and relies on a rough prior geometric model about insects of interest (i.e., a salient rectangular intensity profile) to extract the patterns likely corresponding to be insects of interest. The insect classification module involves some additional processing in order to classify the extracted patterns into insects or to fake patterns generated either by noise or by illumination reflections. In order not to redo the detection of a previously detected insect, a (cheap) insect tracking module maintains a list of the already detected insects in the previous frames and updates it whenever a new insect is detected by IDM and confirmed by insect classification. All these routines are repeated continuously during scheduled daytime, and the counting results are stored and analyzed in quasi-real time.
|
We have shown the feasibility of a video-surveillance system for pest management in a greenhouse and managed to overcome most of the initially posed image processing challenges. Currently, we are in the phase of testing and refining our algorithms. The currently developed prototype should be deployed soon in actual greenhouse sites of our INRA partners (Avignon) for further testings and validations.
Real-time Monitoring of TORE Plasma
Participants : Vincent Martin, François Brémond, Monique Thonnat.
This project aims at developing an intelligent system for the real time surveillance of the plasma evolving in Tore Supra or other devices. The first goal is to improve the reliability and upgrade the current real time control system operating at Tore Supra. The ultimate goal is to integrate such a system into the future ITER imaging diagnosis. In this context, a first collaboration has recently started between the Plasma Facing Component group of CEA Cadarache and the Pulsar project-team. The goal is to detect events (expected or not) in real-time, in order to control the power injection for the protection of the Plasma Facing Components (PFC). In the case of a known event, the detection must lead to the identification of this event. Otherwise, a learning process is proposed to assimilate this new type of event. In such way, the objective of the project is twofold: machine protection and thermal event understanding. The system may take multimodal data as inputs: plasma parameters (plasma density, injected power by heating antenna, plasma position...), infrared and visible images, and signals coming from others sensors as spectrometers and bolometers. Recognized events are returned as outputs with their characteristics for further physical analysis. In this application, we benefit from the large amount of available data accumulated during thousands of pulses and for several devices. We rely on an ontology-based representation of the a priori domain expert knowledge of thermal events. This thermal event ontology is based on visual concepts and video event concepts useful to describe and to recognize a large variety of events occurring in thermal videos.
New results in thermal event detection
This year, we have focused on the improvement and the assessment of the thermal event detection system. As seen in Table 1 , the proposed approach outperforms the previous system based on detection on threshold overrun in specified regions of interest. Improvements are visible both in terms of sensitivity (less false negative than the previous system) and precision (no false positives). This work has been published in [42] . An extended journal version has been accepted and will be be published next year.
Antenna | no. of | Annotated | no. of detected arcing events | |||||
pulses | arcing events | TP | FN | FP | ||||
CS | PA | CS | PA | CS | PA | |||
C2 | 11 | 73 | 68 | 70 | 5 | 3 | 51 | 0 |
C3 | 7 | 17 | 11 | 13 | 6 | 4 | 7 | 0 |
C2 + C3 | 18 | 90 | 79 | 83 | 11 | 7 | 58 | 0 |
New results in thermal event understanding
In the case of a complex thermal event, we have studied learning techniques for temporal behaviors modeling. Our study case focuses on B4C flakes on the vertical edge of the Faraday screen. It is due to the flaking of the B4C coating consequently to the heating caused by fast ion losses. Temperature may overpass the acceptable threshold without apparent risk of damage. The recognition of a B4C flake is not evident and relies on a fine physical analysis based on the hot spot temporal behavior. Our goal is to build statistical models from training samples composed of positive and negative examples of B4C flakes. We used Hidden Markov Models trained with temperature of detected hot spots and injected power as inputs. Preliminary results are convincing and further evaluation of the proposed approach needs to be pursued. Finally, this approach may be extended for the modeling and the automatic recognition of other complex thermal events.
New software development
We are currently developing a Plasma Imaging data Understanding Platform (PInUP) dedicated to thermal event recognition and understanding. This platform is inspired from VSUP and is composed of several modules and knowledge bases. PInUP embeds also a dedicated tool for video annotation. The goal of this tool is twofold: first to build an annotation base of observed thermal events with precise spatiotemporal information, and second to retrieve from the resulting base useful information on thermal events for further PFC aging analysis for instance (see Figure 4 ).
|
This platform is going to be deployed at Tore Supra and will be used by physicists and person in charge of the infrared imaging diagnostics. Concerning the real-time detection of thermal events, we are currently implementing most costly algorithms into a FPGA to reach real-time constraints. The real-time monitoring system should be operational in April 2010 and will be working in parallel of the existing system at Tore Supra.
Human detection and re-identification
Participants : Slawomir Bak, Etienne Corvée, François Brémond, Saurabh Goyal.
Human activity requires the detection and tracking of people in often congested scenes captured by surveillance cameras. The common strategies used to detect objects with high frame rates rely on segmenting and grouping foreground pixels from a background scene captured by static cameras. The detected objects in a 3D calibrated environment are then classified according to predefined 3D models such as persons, luggage or vehicles. However, whenever occlusion occurs, objects are no longer classified as single individuals but are associated to a group of objects. Hence, standard tracking systems fail to differentiate objects within a group and often looses their tracks.
One way to handle occlusion issues is to use multiple cameras viewing the same scene but at different locations in order to cover the possible field of views where occlusion occurs e.g. when one camera fails to track one person, another camera takes over the tracking of this person. People detection in difficult scenarios can also be improved by extracting local descriptors. In the PULSAR team, we have used Histograms of Oriented Gradient, i.e. HOG, to model human appearance and people faces. The aim is to model body parts and people poses of people to better define people shapes. Figures 5 shows results obtained by the HOG face detector and figures 6 show tracking results from people detected by the HOG human detector.
We are not only adding visual signatures to better track people in independent cameras but we are also using visual signatures to allow people tracking in different cameras. These visual signatures would also allow us to re-identify people in more complex networks of cameras where camera fields of view do not overlap e.g. in underground stations or airports. It is thus desirable to determine whether a given person of interest has previously been observed by other cameras in such network of cameras. This constitutes the person re-identification issue. We have focused our first re-acquisition algorithm to combine Haar-like features with dominant colours extracted on mobile objects by the HOG based human detector described above.
People visual signatures are described by Haar-like descriptors shown in figure 7 and by dominant colours extracted from two regions: the upper and lower part as shown in figures 8 . The Adaboost algorithm is adapted to take the people visual signatures as input and to construct a model for each individual. We have tested our algorithm with two non overlapping cameras scenario: 10 people with Caviar database and 40 people in TrecVid database.
|
|
Controlled Object Detection
Participants : Anh-Tuan Nghiem, François Brémond, Monique Thonnat.
Detecting mobile objects is an important task in many video analysis applications such as video surveillance, people monitoring, video indexing for multimedia. Among various object detection methods, the ones based on adaptive background subtraction such as Gaussian mixture model, Kernel density estimation method, Codebook model are the most popular. However, the background subtraction algorithm alone can not easily handle various problems such as adapting to changes of environment, removing noise, detecting ghosts. To help background subtraction algorithms to deal with these problems we have constructed a controller for managing object detection algorithms. Being independent from one particular background subtraction algorithm, this controller has two main tasks:
-
Supervising background subtraction algorithms to update their background representation.
-
Adapting parameter values of background subtraction algorithms to be suitable for the current conditions of the scene.
To supervise background subtraction algorithms to update their background representation, the controller employs the feedback from the classification and tracking task. With this feedback, the controller can ask background subtraction algorithms to apply appropriate updating strategies for different blob types. For example, if the feedback from the classification and tracking tasks identify a noise region, the controller will ask background subtraction algorithms to update the corresponding region quickly so that this noise does not occur again in the detection results. With the updating supervision of the controller, background subtraction algorithms can handle various problems such as removing noise, keeping track of objects of interest, managing stationary objects, and removing ghosts.
To adapt the parameter values of background subtraction algorithms, the controller first needs to evaluate the foreground detection results. This evaluation is realized with the help of the feedback from classification and tracking tasks. Based on this evaluation, the background subtraction algorithm may change its parameter values to have a better performance.
This work has been published in [44] .
Learning Shape Metrics
Participants : Anja Schnaars, Guillaume Charpiat.
The notion of shape is important in many fields of computer vision, from tracking to scene understanding. As for usual object features, it can be used as a prior, as in image segmentation, or as a source of information, as in gesture classification. When image classification or segmentation tasks require high discriminative power or precision, the shape of objects naturally appears relevant to our human minds. However, shape is a complex notion which cannot be dealt with directly like a simple parameter in Rn . Modeling shape manually is tedious, and one arising question is the one of learning shapes automatically.
Shape evolutions, as well as shape matchings or image segmentation with shape prior, involve the preliminary choice of a suitable metric in the space of shapes. Instead of choosing a particular one, we propose a framework to learn shape metrics from a set of examples of shapes, designed to be able to handle sparse sets of highly varying shapes, since typical shape datasets, like human silhouettes, are intrinsically high-dimensional and non-dense. We formulate the task of finding the optimal metrics on an empirical manifold of shapes as a classical minimization problem ensuring smoothness, and compute its global optimum fast.
To achieve this, we design a criterion to compute point-to-point matching between shapes which deals with topological changes. Then, given a training set of shapes, we use these matchings to transport deformations observed on any shape to any other one. Finally, we estimate the metric in the tangent space of any shape, based on transported deformations, weighted by their reliability. We performed successful experiments on difficult sets, in particular we considered the case of a girl dancing fast (figure 9 ). For each shape from the training set of shapes, we estimate the most probable deformations (see figure 10 ) that it can undergo. More precisely we estimate the shape metrics (deformation costs) that fits the training set the best, which leads to a shape prior. We also proposed applications.
The novelty in this work is both theoretic and practical. On the theoretical side, usual approaches consist either in estimating a mean shape pattern and characteristic deformations, or in using kernels based on distances between shapes, while here the framework is based on reliable deformations and transport, and we provide a criterion on metrics to be minimized. On the practical side, usual approaches require either low shape variability in the training set, or a high sampling density, and these are not affordable in practice. Our assumptions are much weaker so we can deal with much more general datasets, for example the framework is well suited to videos. This work was published in [35] . The links between texture and shape are also being studied, following an approach developed in [51] .
Online People Trajectory Evaluation
Keywords : Online Evaluation, Object Tracking.
Participants : Duc Phu Chau, Francois Brémond, Monique Thonnat.
A tracking algorithm can provide satisfying tracking results in some scenes and poor results in other real world scenes. A measure of performance evaluation of these algorithms is necessary to quantify how reliable a tracking algorithm is in a particular scene. Many types of metrics have been proposed and defined to address this issue but most of them are dependent on ground truth data in order to compare tracking results. We propose in this work a new online evaluation method that is independent from ground truth data. We want to compute the quality (i.e. coherence) of the obtained trajectories based on a set of seven features. Based on the frequency of each feature for the mobile objects, these features are divided into two groups: “one time features“ and “every time features“. While “one time features“ are the features that can be computed once (one unique time) for a mobile object (e.g. temporal length of its trajectory, zone where the object leaves the scene), “every time features“ can be computed for each frame during the tracked duration (e.g. color, speed, direction, area and shape ratio).
For each feature, we define a local score in the interval [0, 1] to determine whether the mobile object is correctly tracked or not. The quality of a trajectory is estimated by the summation of local scores computed from the extracted features. The score decreases when the system detects a tracking error and increases otherwise.
Using the seven features a global score which is in interval [0, 1], is defined to evaluate online the quality of a tracking algorithm at each frame. When the global score is greater than 0.5, we can say that the tracking algorithm performance is rather good. Whereas if the value of the global score is lower than 0.5, that means the tracker generally fails to track accurately the detected objects.
We have tested our approach in the video sequences of Caretaker project (http://www-sop.inria.fr/pulsar/personnel/Francois.Bremond/topicsText/caretakerProject.html ) and Gerhome project (http://www-sop.inria.fr/members/Francois.Bremond/topicsText/gerhomeProject.html ). In figure 11 , the person in the left image is tracked by a tracking algorithm and the right image shows the online evaluation score of the tracking algorithm. The results of the proposed online evaluation method are compatible with the output of the offline evaluation tool using ground truth data. These experimentations validate the performance of the online evaluation method. This work has been published in [37] .
People Tracking Using HOG Descriptors
Keywords : Understanding, Learning.
Participants : Piotr Bilinski, Carolina Garate, Mohammed Bécha Kaâniche, François Brémond.
We propose a multiple object tracking algorithm working with occlusions. First, for each detected object we compute feature points using FAST algorithm [61] . Second, for each feature point we build a descriptor based on Histogram of Oriented Gradients (HOG) [53] . Third, we track feature points using these descriptors. Object tracking is possible even if objects are partially occluded. If few objects are merged and detected as a single one, we assign newly detected feature points in such single object to one of these occluded objects. We apply a probabilistic method for this task using information from the previous frames like object size and motion information (i.e. speed and orientation). We use multi resolution images to decrease the processing time. Our approach has been tested on a synthetic video sequence and the public datasets KTH and CAVIAR. The preliminary tests confirm the effectiveness of our approach. This work has been published in [33] .
An extension of this tracker have been proposed for crowd analysis. The recognition in real time of crowd dynamics in public places are becoming essential to avoid crowd related disasters and ensure safety of people. We introduce a new approach for Crowd Event Recognition. Our study begins with the previous tracking method, based on HOG descriptors, to finally use pre-defined models (i.e. crowd scenarios) to recognize crowd events. We define these scenarios using statistics analysis from the data sets used in the experimentation. The approach is characterized by combining a local analysis with a global analysis for crowd behavior recognition. The local analysis is enabled by a robust tracking method, and global analysis is done by a scenario modeling stage. This work has been published in [39] .
Human Gesture Learning and Recognition
Keywords : Understanding, Learning.
Participants : Mohammed Bécha Kaâniche, François Brémond.
We aim at recognizing gestures (e.g. hand raising) and more generally short actions (e.g. falling, bending) accomplished by an individual in a video sequence. Many techniques have been already proposed for gesture recognition in specific environment (e.g. laboratory) using the cooperation of several sensors (e.g. camera network, individual equipped with markers). Despite these strong hypotheses, gesture recognition is still brittle and often depends on the position of the individual relatively to the cameras. We propose to reduce these hypotheses in order to conceive a general algorithm enabling the recognition of the gesture of an individual acting in an unconstrained environment and observed through a limited number of cameras. The goal is to estimate the likelihood of gesture recognition in function of the observation conditions.
We propose a gesture recognition method based on local motion learning. First, for a given individual in a scene, we track feature points over its whole body to extract the motion of the body parts. Hence, we expect that the feature points are sufficiently distributed over the body to capture fine gestures. We have chosen corner points as feature points to improve the detection stage and HOG (Histogram of Oriented Gradients) as descriptor to increase the reliability of the tracking stage. Thus, we track the HOG descriptors in order to extract the local motion of the feature points.
In order to recognize gestures, we propose to learn and classify gesture based on the k-means clustering algorithm and the k-nearest neighbors classifier. For each video in a training dataset, we generate all local motion descriptors and annotate them with the associate gesture. Then, for each training video taken separately, the descriptors are clustered into k clusters using the kmeans clustering algorithm. The k parameter is set up empirically. Each cluster is associated to its corresponding gesture, so similar clusters can be labeled with different gestures. Finally, with all generated clusters as a database, the k-nearest neighbor classifier is used to classify gestures occurring in the test dataset. A video is classified according to the amount of neighbors which have voted for a given gesture providing the likelihood of the recognition.
We demonstrate the effectiveness of our motion descriptors by recognizing the actions of KTH and IXMAS public databases. This work has been published in [41] , [23] .
Monitoring Elderly Activities, Using Multi Sensor Approach and Uncertainty Management
Participants : Nadia Zouba, Bernard Boulay, Valery Valentin, Rim Romdhame, Daniel Zullo, François Brémond, Monique Thonnat.
Monitoring Daily Activities of Elderly Living Alone at Home
Participants: Nadia Zouba, Valery Valentin, Bernard Boulay, François Brémond, Monique Thonnat
In the framework of monitoring elderly activities at home, we have proposed an approach combining heterogeneous sensor data for recognizing elderly activities at home. This approach consists in combining data provided by video cameras with data provided by environmental sensors to monitor the interaction of people with the environment.
In this work we have done a strong effort in event modeling. The result is 100 models representing our knowledge base of events for home care applications. This knowledge base can be used in other applications in the same domain.
In this approach we have also proposed a sensor model able to give a coherent representation of the information provided by various types of physical sensors. This sensor model includes an uncertainty in sensor measurement.
The approach is used to define a behavioral profile for each person and to compare these behavioral profiles. The first step to establish a behavioral profile of an observed person is to determine his/her daily activities. This behavioral profile is defined as a set of the most frequent and characteristic (i.e. interesting) activities of an observed person. The basic goal of determine behavioral profile is to measure variables from persons during their daily activities in order to capture deviations of activity and posture to facilitate timely intervention or provide automatic alert in emergency cases.
In order to evaluate the whole proposed activity monitoring framework, several experiments have been performed. The main objectives of these experiments are to validate the different phases of the activity monitoring framework, to highlight interesting characteristics of the approach, and to evaluate the potential of the framework for real world applications.
The results of this approach are shown for the recognition of Activities of Daily Living (ADLs) of real elderly people living in an experimental apartment (Gerhome laboratory) equipped with video sensors and environmental sensors.
Results comparing volunteer 1 (male of 64 years) and volunteer 2 (female of 85 years), observed during 4 hours, show the greater ADLs ability of the 64 years old adult as compared to that of the 85 years old:
-
Volunteer 1 of 64 years changed zones more often than volunteer 2 of 85 years (for "entering livingroom" 20 vs. 13), and did this at a quicker pace (00:01:15 vs. 00:02:42), showing a greater ability to walk.
-
Volunteer 1 was more often seen "sitting on chair" (15 vs. 4), but volunteer 2 was "sitting on chair" for a longer duration (03:30:29 vs. 01:36:43), showing also a greater ability for the volunteer 1 to move in the apartment.
-
Volunteer 1 was using more the "upper cupboard" than the volunteer 2 (22 vs. 9), and in a quicker way (00:00:57 vs. 00:04:43).
-
Volunteer 1 was more able to use the stove (less trials for "using stove" 35 vs. 106).
-
Similarly volunteer 1 was "bending" twice as much as volunteer 2 (30 vs. 15), and in a quicker way (00:00:03 vs. 00:00:12), showing greater dynamism for the younger volunteer.
More details about the proposed approach and the obtained results are described in [49] , [48] and in [29] .
In the current work, the proposed activity recognition approach was evaluated in the experimental laboratory (Gerhome) with fourteen real elderly people. The next step of this work requires to test this approach in hospital environment (see the next section) involving more people with different wellness and different health status (e.g. Alzheimer Patients).
Monitoring Activities for Alzheimer Patients from CHU Nice
Participants: Rim Romdhame, Daniel Zullo, Nadia Zouba, Bernard Boulay, François Brémond, Monique Thonnat
We propose to develop a framework for monitoring Alzheimer patients as the continuity of the ADL monitoring framework. The basic goal is to determine behavioral profiles of Alzheimer people and evaluate these profiles. With the help of doctors a specific scenario has been established to evaluate the behaviors of Alzheimer patients.
Some experiments have been performed in a room in CHU of Nice equipped with video where elderly people and medical volunteers have spend between 15 min and 1 hour:
-
1 Alzheimer Volunteer (80 years old)(He and his relatives have signed an agreement)
-
3 Elderly Volunteers (64-85 years old)
-
5 Young Volunteers (20-30 years old)
-
1 medical staff Volunteer (25-30 years old)
The second goal of this framework is to handle the uncertainty of event recognition. Most previous approaches, able to recognize events and handling uncertainty, are 2D approaches which model an activity as a set of pixel motion vectors. These 2D approaches can only recognize short and primitive events but cannot address composite events. We propose a video interpretation approach based on uncertainty handling. The main goal is to improve the techniques of automatic video data interpretation taking into account the imprecision of the recognition. To attain our goal, we have extended the event recognition described by T.VU by modeling the scenario recognition uncertainty and computing the precision of the 3D information characterising the mobile objects moving in the scene. We have used the 3D information to compute the spatial probability of the event. We have also computed the temporal probability of an event based on its spatial probability at the previous instant. This approach is validated using a homecare application which tracks elderly people living at home and recognizes events of interest specified by gerontologists.
This work has been published in [46] .
Trajectory Clustering for Activity Learning
Keywords : Understanding, Learning, Data-mining.
Participants : José Luis Patino Vilchis, Guido-Tomas Pusiol, François Brémond, Monique Thonnat.
Trajectory information is a rich descriptor, which can produce essential information for activity learning and understanding. Our work on analysis of underground stations with trajectory clustering [34] has shown that trajectory patterns associated to the clusters are indicative of specific behaviors and activities. Our new studies explore the application of trajectory clustering on two (different behavior types) new domains: 1) Monitoring of elderly people at home; 2) Monitoring the ground activities at an airport dockstation.
1) Monitoring of elderly people at home.
In this work we propose a framework to recognize and classify loosely constrained activities with minimal supervision. The framework uses basic trajectory information as input and goes up to video interpretation. The work reduces the gap between low-level information and semantic interpretation, building an intermediate layer of Primitive Events. The proposed representation for primitive events aims at capturing small coherent units of motion over the scene with the advantage of being learnt in an unsupervised manner. We propose the modeling of an activity using Primitive Events as the main descriptors. The activity model is built in a semi-supervised way using only real tracking data.
The approach is composed of 5 steps. First, people are detected and tracked in the scene and their trajectories are stored in a database, using a region based tracking algorithm. Second, the topology of the scene is learnt, using the regions where the person usually stands and stops interacting with fixed objects in the scene. Third, the transitions between these Slow Regions are computed by cutting the observed person trajectory. These transitions correspond to short unit of motion and can be seen as basic elements constituting more complex activities. Fourth, a coarse activity ground-truth is manually performed on a reference video corresponding to the first monitored person. Primitive Event histograms are computed and labelled by performing a matching stage with this ground-truth. Fifth, using these labeled Primitive Event histograms the activities of a second monitored person can be automatically recognized.
We validate the approach by recognizing and labeling modeled activities in a home-care (Gerhome project) application. The used video datasets capture the living room and kitchen of an apartment. Each video dataset contains an aged person performing activities, such as "the person is eating". These activities are learnt and discovered using different video datasets. This work has been published in [45] .
2) Monitoring the ground activities at an airport dockstation.
In this work we employ trajectory-based analysis for activity extraction from apron monitoring in the Toulouse airport in France. We aim at helping the infrastructure managers; for the everyday operation we provide environmental figures, which include location and number of people in the monitored areas (occupation map), and the activities themselves. We have built a system thus to 1) learn which monitored areas are normally occupied, and then 2) perform activity pattern discovery with interpretable semantic.
Trajectory clustering is employed mainly to discover the points of entry and exit of mobiles appearing in the scene. Proximity relations between resulting clusters of detected mobiles as well as between clusters and contextual elements from the scene are employed to build the occupancy zones and characterise the ongoing different activities of the scene. We study the scene activity at different granularities which give the activity description in broad terms, or with detailed information thus managing different information levels. By including temporal information we are able to find spatio-temporal patterns of activity. Thanks to an incremental learning procedure, the system is capable to handle large amounts of data. We have applied our algorithm to five video datasets corresponding to different monitoring instances of an aircraft in the airport docking area. This corresponds to about five hours of video analysed. Figure 12 shows occupancy zones at the system-selected information levels for scene activity reporting. We elaborate activity maps with a semantical description of the discovered zones (and thus of the associated activities). From the analysed sequences, we were able to recognize activities such as 'GPU arrival', 'Loading', 'Unloading'.