Section: Scientific Foundations
Participants : Benedikt Bollig, Thomas Chatain, Paul Gastin, Stefan Haar, Serge Haddad, Stefan Schwoon, Marc Zeitoun.
Property of systems allowing some interacting processes to be executed in parallel.
The process of deducing from a partial observation of a system aspects of the internal states or events of that system; in particular, fault diagnosis aims at determining whether or not some non-observable fault event has occurred.
- Conformance Testing:
Feeding dedicated input into an implemented system IS and deducing, from the resulting output of I , whether I respects a formal specification S .
It is well known that, whatever the intended form of analysis or control, a global view of the system state leads to overwhelming numbers of states and transitions, thus slowing down algorithms that need to explore the state space. Worse yet, it often blurs the mechanics that are at work rather than exhibiting them. Conversely, respecting concurrency relations avoids exhaustive enumeration of interleavings. It allows us to focus on `essential' properties of non-sequential processes, which are expressible with causal precedence relations. These precedence relations are usually called causal (partial) orders. Concurrency is the explicit absence of such a precedence between actions that do not have to wait for one another. Both causal orders and concurrency are in fact essential elements of a specification. This is especially true when the specification is constructed in a distributed and modular way. Making these ordering relations explicit requires to leave the framework of state/interleaving based semantics. Therefore, we need to develop new dedicated algorithms for tasks such as conformance testing, fault diagnosis, or control for distributed discrete systems. Existing solutions for these problems often rely on centralized sequential models which do not scale up well.
Participants : Benedikt Bollig, Paul Gastin, Stefan Haar, Stefan Schwoon, Marc Zeitoun.
Fault Diagnosis for discrete event systems is a crucial task in automatic control. Our focus is on event oriented (as opposed to state oriented ) model-based diagnosis, asking e.g. the following questions:
given a - potentially large - alarm pattern formed of observations,
what are the possible fault scenarios in the system that explain the pattern ?
Based on the observation, can we deduce whether or not a certain - invisible - fault has actually occurred ?
Model-based diagnosis starts from a discrete event model of the observed system - or rather, its relevant aspects, such as possible fault propagations, abstracting away other dimensions. From this model, an extraction or unfolding process, guided by the observation, produces recursively the explanation candidates.
In asynchronous partial-order based diagnosis with Petri nets  ,  ,  , , one unfolds the labelled product of a Petri net model and an observed alarm pattern , also in Petri net form. We obtain an acyclic net giving partial order representation of the behaviors compatible with the alarm pattern. A recursive online procedure filters out those runs (configurations) that explain exactly . The Petri-net based approach generalizes to dynamically evolving topologies, in dynamical systems modeled by graph grammars, see  .
Observability and Diagnosability
Diagnosis algorithms have to operate in contexts with low observability, i.e., in systems where many events are invisible to the supervisor. Checking observability and diagnosability for the supervised systems is therefore a crucial and non-trivial task in its own right. Analysis of the relational structure of occurrence nets allows us to check whether the system exhibits sufficient visibility to allow diagnosis. Developing efficient methods for both verification of diagnosability checking under concurrency, and the diagnosis itself for distributed, composite and asynchronous systems, is an important field for MExICo .
Distributed computation of unfoldings allows one to factor the unfolding of the global system into smaller local unfoldings, by local supervisors associated with sub-networks and communicating among each other. In  ,  , elements of a methodology for distributed computation of unfoldings between several supervisors, underwritten by algebraic properties of the category of Petri nets have been developed. Generalizations, in particular to Graph Grammars, are still do be done.
Computing diagnosis in a distributed way is only one aspect of a much vaster topic, that of distributed diagnosis (see  ,  ). In fact, it involves a more abstract and often indirect reasoning to conclude whether or not some given invisible fault has occurred. Combination of local scenarios is in general not sufficient: the global system may have behaviors that do not reveal themselves as faulty (or, dually, non-faulty) on any local supervisor's domain (compare  ,  ). Rather, the local diagnosers have to join all information that is available to them locally, and then deduce collectively further information from the combination of their views. In particular, even the absence of fault evidence on all peers may allow to deduce fault occurrence jointly, see  ,  . Automatizing such procedures for the supervision and management of distributed and locally monitored asynchronous systems is a mid-term goal of MExICo . We have started a cooperation on these topics with Philippe Dague from University Paris-Sud; the above participants within the team have established a working group on the subject of distributed diagnosability.
Participants : Benedikt Bollig, Paul Gastin, Stefan Haar.
The gap between specification and implementation is at the heart of research on formal testing. The general conformance testing problem can be defined as follows: Does an implementation conform a given specification ? Here, both and are assumed to have input and output channels. The formal model of the specification is entirely known and can be used for analysis. On the other hand, the implementation is unknown but interacts with the environment through observable input and output channels. So the behavior of is partially controlled by input streams, and partially observable via output streams. The Testing problem consists in computing, from the knowledge of , input streams for such that observation of the resulting output streams from allows to determine whether conforms to as intended.
In this project, we focus on distributed or asynchronous versions of the conformance testing problem. There are two main difficulties. First, due to the distributed nature of the system, it may not be possible to have a unique global observer for the outcome of a test. Hence, we may need to use local observers which will record only partial views of the execution. Due to this, it is difficult or even impossible to reconstruct a coherent global execution. The second difficulty is the lack of global synchronization in distributed asynchronous systems. Up to now, models were described with I/O automata having a centralized control, hence inducing global synchronizations.
Since 2006 and in particular during his sabbatical stay at the University of Ottawa, Stefan Haar has been working with Guy-Vincent Jourdan and Gregor v. Bochmann of UOttawa and Claude Jard of IRISA on asynchronous testing. In the synchronous (sequential) approach, the model is described by an I/O automaton with a centralized control and transitions labeled with individual input or output actions. This approach has known limitations when inputs and outputs are distributed over remote sites, a feature that is characteristic of , e.g., web computing. To account for concurrency in the system, they have developed in  ,  asynchronous conformance testing for automata with transitions labeled with (finite) partial orders of I/O. Intuitively, this is a “big step” semantics where each step allows concurrency but the system is synchronized before the next big step. This is already an important improvement on the synchronous setting. The non-trivial challenge is now to cope with fully asynchronous specifications using models with decentralized control such as Petri nets.
Message-Sequence-Charts (MSCs) provide models of behaviors of distributed systems with communicating processes. An important problem is to test whether an implementation conforms to a specification given for instance by an HMSC. In local testing , one proceeds by injecting messages to the local processes and observing the responses: for each process p , a local observer records the sequence of events at p . If each local observation is consistent with some MSC defined by the specification, the implementation passes the test. If local testing on individual processes suffices to check conformance, the given specification (an HMSC language) is called locally testable. Local testability turns out to be undecidable even for regular HMSC languages  ; the main difficulty lies in the existence of implied scenarios, i.e., global behaviors which are locally consistent with different specification scenarios. There are two approaches to attack the problem of local testing in light of this bottleneck. One is to allow joint observations of tuples of processes. This gives rise to the problem of k -testability where one allows joint observations of up to k processes  . We will look for structural conditions on the model or the specification ensuring k -testability. Another tactic would be to recognize that practical implementations always work with bounded buffers and impose an upper bound B on the buffer size. The set of B -bounded MSCs in the k -closure of a regular MSC language is again regular, so the B -bounded k -testability problem is decidable for all regular HMSC-definable specifications. The focus could now be on efficiently identifying the smallest k for which an HMSC specification is k -testable. Another interesting problem is to identify a minimal set of tests to validate a k -testable specification.
The first step that should be reached in the near future is the completion of asynchronous testing in the setting without any big-step synchronization. In parallel, work on the problems in local testing should progress sufficiently to allow, in a mid-term perspective, to understand the relations and possible interconnections between local (i.e. distributed) and asynchronous (centralized) testing. The mid-to long term goal (perhaps not yet to achieve in a four-year term) is the comprehensive formalization of testing and testability in asynchronous systems with distributed architecture and test protocols.