Section: New Results
Diagnosis of large scale discrete event systems
Participants : Marie-Odile Cordier, Christine Largouet, Xavier Le Guillou, Sophie Robin, Laurence Rozé.
The problem we deal with is monitoring complex and large discrete-event systems (DES) such as telecommunication networks or web services. Two approaches are used in our research group. The first one consists in representing the system model as a discrete-event system by an automaton. The diagnostic task consists in determining the trajectories (a sequence of states and events) compatible with the sequence of observations. From these trajectories, it is then easy to determine (identify and localize) the possible faults. In the second approach, the model consists in a set of predefined characteristic patterns. We use temporal patterns, called chronicles, represented by a set of temporally constrained events. The diagnostic task consists in recognizing these patterns by analyzing the flow of observed events.
Distributed and incremental diagnosis of discrete-event systems
One of the main difficulties of discrete event system modeling is the intractable size of the model and the huge number of states and trajectories to be explored. To cope with this problem, we proposed to use a decentralized approach  which allows us to compute on-line diagnosis without requiring the computation of the global model. Given a decentralized model of the system and a flow of observations, the program computes the diagnosis by combining local diagnoses built from local models (or local diagnosers).
In real systems, generally the observed events do not exactly correspond to the emitted events. Thus, instead of only considering partially ordered observations (as in  ), we proposed to represent the uncertainty on emitted observations by an automaton and extended the decentralized approach to cope with this new representation. In order to deal with on-line diagnosis, we then proposed to slice the observation flow into temporal windows, introduced the concept of automata chains to represent the successive observation slices, and proposed an algorithm to compute the diagnosis in an incremental way from these diagnosis slices  . A revised version of a paper, written on this issue in collaboration with A. Grastien, currently researcher at ANU (Australian National University), is currently submitted to AIJ (Artificial Intelligence Journal).
Distributed monitoring with chronicles
The formalism of chronicles has been proposed a few years ago by C. Dousson in order to monitor dynamic systems in real time  . A chronicle, characterized by a set of temporally constrained events, describes the situations to monitor. Efficient algorithms for on-the-fly chronicle recognition exist, but only in a centralized way up to now. Our main contribution is an extension of the chronicle-based approach to deal with distributed systems.
First, we have extended the formalism proposed by C. Dousson by defining a distributed chronicle model  . The standard chronicle description language is enriched with synchronization constraints.
Then, we have proposed a decentralized monitoring architecture in which each component is equipped with a local diagnoser. A global diagnoser (also called broker ) is in charge of merging the local diagnoses, by checking the synchronization constraints of the local diagnoses.
This work began in the context of the WS-Diamond European project where the overall goal of the project was to monitor and diagnose the processing of a request sent to a web service, which in turn usually requests other services to complete the task: it is common that a fault, occurring in a service, propagates to other services. A new formalism for describing distributed chronicles has been designed. It extends the classical chronicle formalism and is well-suited to describe both the normal or abnormal behaviors of each web service and the communication with the other services.
Each local diagnoser relies on a chronicle recognition system engine based on the CRS application developed by C. Dousson  , and a chronicle base representing the local scenarios which are of interest wrt the diagnostic task. Concerning the global diagnoser, two approaches have been developed depending on the availability of a model of interactions between services. In the first one  , the global diagnoser itself relies on a chronicle recognition system, its chronicle base being built off-line from the interaction model. In the second approach, when no model of interactions is available, for instance in case of a dynamic choreography,the global diagnoser itself has to ensure on-line the correspondence between the synchronization variables from different services and to build the diagnosis tree  .
Two platforms corresponding to the two approaches have been developed (cf. section 5.3 ).
A paper comparing the two approaches has been submitted to a special issue of the french RIA journal and an english version of this paper is nearly ready to be submitted.
We are currently working in two directions. The first one consists in tackling the difficult problem of chronicle acquisition by addressing firstly the automatic construction of chronicle skeletons from the workflow of the different web services.
The second one consists in connecting the diagnosis and the repair tasks by integrating distributed repair processes within our decentralized diagnosis architecture. Interleaving diagnosis and repair in the framework of intelligent networks is one of the issue we investigate in cooperation with the planning and diagnosis group of NICTA's Canberra Research Laboratory and the team GEMO of Inria Saclay. A proposal for an Inria Associate Team has been submitted. Extending the distributed chronicle recognition approach for diagnosis to incluse repair actions is the first step we are currently working on to reach this objective.
Diagnosability and self-healability of discrete-event systems
After having addressed the problem of diagnosability of complex supervision patterns by  , and proposed a common theoretical paradigm for diagnosability of both discrete and continuous systems, based on the common concept of signature  , we have started a new line of collaborative work with the Disco/Laas research group, within our common participation in the WS-DIAMOND project. Based on the signature-based definition of diagnosability alone, and introducing the concept of repairability (i.e. the existence of at least one applicable repair procedure for each fault that may occur in the system), our goal was now to consider jointly diagnosis and repair capabilities in complex, discrete-event or continuous, systems. That led us to define formally the "self-healability" property of such systems  ,  : a system is said to be self-healable if and only if there exists a set of 'macro-faults' (i.e. identified situations in which several candidate faults may still not be discriminated) which can be matched to at least one repair procedure. We extended this definition to cope with conditional repairs and temporal constraints; this work has not been yet published. We intend to extend this first work which focuses on a centralized system, to deal with distributed systems, and with temporally related events (expressed as chronicles).
Scenario patterns for exploring qualitative ecosystems
This work aims at giving means of exploring complex systems (in our case, ecosystems). We propose to transform environmental questions about future evolution of ecosystems into queries that could be submitted to a simulation model. When dealing with environmental problems, scenarios are widely used tools for evaluating future evolution of ecosystems given policy options, potential climatic changes or impacts of catastrophic events. If the scenarios are generally expressed in natural language, when working with a model describing the ecosystem, it is necessary to transform them into formalised queries that can be given as input to the model.
We propose to model the system in a distributed qualitative way and we define a high-level language to query the model. The system behavior is represented as a discrete-event system, described by a set of interacting timed automata. The ecosystem is represented as a set of interacting subsystems and the global model obtained by composition on shared events. This technique is particularly suited to representing large-scale systems such as ecosystems. To explore the system, we define generic patterns, associated to the most usual types of scenarios, and translate them into temporal logic formula. The answer is computed thanks to model-checking techniques, that are efficient for analysing large-scale systems. Five generic patterns have been defined using TCTL (Timed Computation Tree Logic) : WhichStates , WhichDate , WhichStates-Si , Stability , Safety , three of these patterns have been implemented using the model-checker UPPAAL  . We experimented our approach on a marine ecosystem under fishing pressure. The (simplified) model describes the tropho-dynamic interactions between fish trophic groups as well as interactions with the fishery activities and with an environmental context. The results concern the impact of a fishery growing policy on the ecosystem  .