Section: New Results
Fault tolerance in multi-agent systems
Participants : Olivier Marin [ correspondent ] , Corentin Méhat.
Distributed agent systems stand out as a powerful tool for designing scalable software. The general outline of distributed agent software consists of computational entities which interact with one another towards a common goal that is beyond their individual capabilities
Our research focuses on middleware to deploy agent on large-scale environments and mobile networks. Our main topics of interest comprise: fault tolerance, process replication, and dynamicity with respect to both environments and applications. The ongoing research projects we are working on are all related to these topics.
The FRAME (Failure Resilient Agents in Mobile Environments) project – funded by LIP6 in 2006 and 2007 – aims at designing a middleware for the deployment of distributed algorithms among mobile devices. The originality of our approach is double: (i) we view partial and total disconnection as types of failures and aim to integrate fault tolerance solutions in order to guarantee the continuity of the computation in such a context, and (ii) we provide a modeling language which is close to Pi-calculus and yet focuses on communication channels in order to represent replicated applications and introduce failures. The ongoing PhD effort of Corentin Méhat is at the core of this project.
Our current work addresses the resiliency of group communications among mobile devices  : the exchanged messages are transparently rerouted inside a structured P2P overlay – in our case Pastry – and can thus be accessed asynchronously. This has lead to a new platform design: we are presently implementing the resulting design over FreePastry in order to evaluate its performances.
The DARX (Dynamic Agent Replication eXtension) project aims at building an architecture for fault-tolerant agent computing in multi-cluster networks. The originality of our approach lies in two features: (i) an automated replication service which chooses for the application which of its computational components are to be made dependable, to which degree, and at what point of the execution, and (ii) the hierarchic architecture of the middleware which ought to provide suitable support for large-scale applications. DARX is now a component of the FACOMA project, which is supported in the context of the ANR-SETIN frame. FACOMA was originally supposed to end in 2009, but its extension has been approved by the ANR until 2010.
The latest advances include building a distributed exception-handling system which can be shared by the agent application and the dynamic replication service, and integrating heuristics on the system-level servers in order to drive the load-balancing decisions related to replicating agents.
The DDEFCON (Dependable DEployment oF Code in Open eNvironments) project addresses the safe and secure deployment of collaborative software components over large-scale networks. We seek to achieve a deployment platform that can be implemented on top of a structured peer to peer overlay. DDEFCON is funded as part of the LIP6 young projects initiative; it started in 2008, and has been extended for a second year. It serves as a basis for the PhD thesis of Nicolas Gibelin.
We are currently working on a DHT based service for registering resources and allowing multi-criteria searches of these resources. We have already implemented and tested two different implementations on a local cluster. The results are promising, and we are now deploying our implementations on Grid5000 for further performance evaluations.