Section: New Results
Autonomic Grids have been considered a promising field of ML applications for TAO since 2006, with its implication in the EGEE (Enabling Grids for E-Science in Europe, infrastructure project (2001-2003), (2003-2007), (2008-2010).)grid. This implication has resulted in the Grid Observatory project, supported by EGEE, DIGITEO and CNRS. The first goal of the Grid Observatory is to provide a publicly accesible repository of grid traces.
The second goal of the Grid Observatory is to provide a better understanding of the Grid and through this, better optimisation.
Application developers need synthetic characterisations of grid activity and the grid applications for predicting and optimising application performance.
Grid models are required for dimensioning, capacity planning, or evaluating the impact of evolutions in grid configuration and middleware.
Self-regulation and self-maintenance are desired functionalities in many areas, ranging from resource allocation to real-time fault diagnosis, including green computing as an increasingly urgent constraint.
An important, often underestimated, factor that increases the complexity of grid modelling and policy design is that the grid are based on a mutualisation paradigm. Considering middleware architecture, a grid federates independently managed resources, thus is not only a collection of heterogeneous and shared resources, but a hierarchical structure. Because the resources are made available and maintained by institutions related to specific scientific communities, sociological, administrative, and institutional constraints must be taken into account. Because the resource must serve the needs of these communities, a new concept has emerged for the grid exploitation model, the Virtual Organizations (VO's), which represent groups of users with similar access rights. A new driving factor is the collective behavior of users, whose activities will tend to be correlated along scientific communities. This has multiple consequences, ranging from what is accessible for observation to the acceptable hypotheses for middleware design. This observation constitues the specificity of the grid target inside the Autonomic Computing field.
The results can be organized along two axes. : modeling the dynamics of the grid, and model-free scheduling policies.
Modeling the dynamics of the grid
The goal is to model the complex interactions between the grid middleware and the e-scientist queries.
In collaboration with Supelec, classical time-series methods have been applied to samples of the activity of the whole EGEE grid  . The counting process associated to job arrivals at sites, as well the load at sites, have been explored. The results for the counting point toward a) strong non-stationarity b) self-similarity; preliminary work indicates that a derived process amenable to stationarity (the bursts) can be defined and parameterized. Considering the load, the best model solely so far is fairly complex, belonging to the GARCH category.
The second category of results concerns extending Affinity Propagation (AP) to data steaming  ,  ,  . AP extracts the data items, or exemplars, that best represent the dataset using a message passing method. Several steps are made to build StrAP. WAP (Weighted AP) extends AP to to deal with duplicated items with no loss of performance. Hi-WAP (Hierarchical WAP) reduces the quadratic complexity of AP, by applying AP on data subsets and further applying Weighted AP on the exemplars extracted from all subsets. Finally StrAP extends Hierarchical WAP to online clustering by storing the outliers and occasionally updating the exemplars to deal with changes in the data distribution, using Page-Hinkley change-point detection statistical test. Experiments on classical benchmarks show that the Hi-Warp order of magnitude gain in performance compared to AP is not acquired at the expense of a comparable increase in distortion. Experiments with the Intrusion Detection benchmark (KDD99) show that StrAP improves clustering accuracy over the state of the art DenStream algorithm. Ongoing work explores the application of StrAP to streaming the EGEE gLite operational data, with lightweight, probe-free, fault detection and diagnosis as a target.
Model-free scheduling policy
The central tenet of this activity is that the combination of utility functions and reinforcement learning (RL) can provide a general and efficient method for dynamically allocating grid resources in order to optimize the satisfaction of both end-users and participating institutions. The flexibility of an RL-based system allows modeling the state of the grid, the jobs to be scheduled, and the high-level objectives of the various actors on the grid. RL-based scheduling can seamlessly adapt its decisions to changes in the distributions of inter-arrival time, QoS requirements, and resource availability. Moreover, it requires minimal prior knowledge about the target environment, including user requests and infrastructure.  ,  focus on a specific multi-objective setting: QoS for responsiveness on one hand, and weighted fair-share on the other hand, under realistic constraints. These include work-conserving scheduling, and also the choice of an on-line policy (SARSA algorithm), where the approximate value function guides the selection of the next action a . Given the relatively high-dimensionality of the continuous state-action space, a non-linear continuous approximation of the value function is required, as proposed in Tesauro's work. Our experimental results, both on a synthetic workload and a real trace from EGEE, show that RL is not only a realistic alternative to empirical scheduler design, but is able to outperform them. Ongoing work deals with a) a more systematic approach of the multi-objective framework and b) a more realistic model of the workload.