Section: Overall Objectives
The research objectives of the Dream team are about monitoring complex systems. Our aim is to enhance monitors in such a way that they can achieve their task at best, especially when the monitored systems are subject to failures, to degraded functionality or more generally when the (internal or external) context is evolving. We consider mainly distributed systems composed of a set of components, or a set of agents (software agents, robots) which cooperate to achieve a common goal. They can be physical systems as telecommunication networks, software systems as web services, or even a set of collaborative robots. In all cases, they are supposed to achieve a goal (or contract), possibly expressed by a set of QoS (Quality of Service) constraints. These systems can be either supervised by a human operator or autonomous systems. In the first case, the operator is in charge of taking the decisions as it is often the case for telecommunication or power distribution networks. The second case includes embedded systems, such as web services, robotic or automotive systems, where the repair/reconfiguration actions are triggered by the system itself. The idea which has oriented our research activities for many years is that the repair/recovery actions must be based not only on symptoms (as exception handlers do for instance), but on diagnosing the deep causes of failures. In this view, the diagnosis task, which has the burden of analyzing the symptoms in order to locate and identify the causes, becomes the cornerstone on which relies the decision task. Since observed systems may be quite large, most of the time some part of the diagnosis must be computed locally, in a distributed way. But, to have a global view of the situation, a synchronization process is necessary. Thus, in our research, we assume that an agent (called user when it is a human agent and broker when it is a software agent), is in charge of exploiting the diagnosis to trigger the adequate action. The decision can consequently be globally consistent, even if actions are locally executed in a distributed way. These two credos impact our research directions both from an architectural point of view (not purely distributed) and from a diagnostic point of view (decision-oriented diagnosis): the ultimate goal of the system is not merely to diagnose itself, but to react, or suggest reaction, accurately, based on a diagnosis. We are especially interested in self-healing systems, i.e. systems able to repair themselves after the occurrence of a fault: they should be able to always provide a sufficient diagnosis to trigger an adequate repair process. We have chosen to develop a model-based approach. The diagnosis task assumes the existence of a model of the system, describing the expected behavior as well as potential faulty behaviors. Moreover, we give the preference to qualitative models, which give an abstract view of the system and which is often easier to understand. Qualitative model-based approaches are advocated for at least two main reasons:
they are “glass-box” approaches which means that diagnoses and recommended actions can be explained to the user in an explicit and adequate language,
they are flexible enough and are then adapted to quickly evolving systems such as technological systems (for instance telecommunication components).
Dealing with dynamical systems in an on-line context, we give a central role to temporal information, when modeling systems as well as when diagnosing situations, or deciding actions, and hence also when acquiring knowledge on the systems. The models are expressed using event-based formalisms such as discrete-event systems (mainly described by automata), or sets of chronicles (a chronicle is a temporally constrained set of events).
To sum up, the challenge we have in mind is to design smart (or lively) systems, both adaptable and dependable, to answer the demand for self-healing embedded systems. The approach we propose is to develop formal methods and efficient algorithms dedicated to the on-line monitoring and repair-oriented diagnosis of complex distributed systems using qualitative temporal models. In this context, the research questions we are investigating are the following. Even though they are clearly highly related, we have chosen for the sake of clarity to present them in two distinct paragraphs. The first one is devoted to on-line monitoring issues and the second one to design and model acquisition issues. Finally, we provide some application domains.