Section: Scientific Foundations
Dependability and Group Communication
Agreement Problems and Group Communication Services
To cope with dependability and security issues (namely reliability, availability, integrity, confidentiality, and privacy) when both accidental and intentional faults may occur, we promote the use of group communication services. We target mainly small distributed systems that contain less than a few tens of nodes. Obviously, the size of the system and the evolution speed of the group composition are two main factors that inherently may increase the cost of the proposed solutions. Yet, despite these limitations, the group concept is a very attractive approach even in the case of large scale dynamic networks. Indeed, many groups include only a limited number of cooperating processes (e.g. sets of replicas) or are the results of a decomposition of the whole system into several sub-systems (e.g. hierarchies, clusters, neighborhoods, or communities of interests). In a system which is composed of numerous heterogeneous, transient and unfamiliar entities, the group concept is sometimes a palliative approach. It compensates for these negative factors by identifying long lasting sub-systems of small size that unify their transient members while forcing them to know each other and to synchronize their activities.
Providing group communication services within a system is essential. Thanks to these general purpose services [38] , entities located at different nodes of a distributed network can remain tightly synchronized despite failures and the asynchrony of the underlying distributed system. The membership service tracks changes in the group composition that result from explicit join and leave operations, as well as implicit (and unpredictable) leave operations due to failures [46] . The membership service ensures that all the processes share a consistent view of the group composition and allows to synchronize the activities of the processes with regard to the successive evolutions of the group composition (view installation, view synchrony, ... ). Fundamental communication abstractions, called broadcast, are also provided. When an entity (belonging to the group or not) broadcasts a message using a group reference, the message is forwarded to all the entities belonging to the current view. When using a reliable broadcast, the message is received by all the non faulty members of the group or by none of them. The broadcast of a message can satisfy various ordering constraints: FIFO order, Causal order or Total order [30] . In the case of a reliable total order broadcast (also called atomic broadcast), all the messages addressed to a group are delivered in the same order by the group members even if these messages have been received in a different order. All these services facilitate the task of an application designer since they guarantee strong properties regarding the delivery of the messages to the recipients and the order in which these messages are delivered. For instance, to increase the overall reliability of a system, both critical data and functionalities may be replicated on a group of nodes. Ensuring consistency within such a set of replicas becomes trivial if group communication services are available.
Many group communication services can be classified as agreement problems. In our work, we design and develop an homogeneous set of services that rely on a solution to the basic consensus problem [29] , [26] . More precisely, we propose to build all the group communication services on top of a generic and adaptive solution to the consensus problem, that can be customized to cope with the characteristics of the environment, as well as the properties of the reliable distributed abstractions that have to be ensured (see the description of the Prometeus software). From an algorithmic point of view, several design choices (definition of tunable consensus protocol parameters, consensus algorithms with multiple round participations, continual execution of consensus instances, use of clock synchronization algorithms to fix the round duration, ... ) lead to obtain an original software whose performance differs from that obtained by other group communication projects (Ensemble - Cornell University and the Hebrew University, Appia - University of Lisbon [43] , Samoa - EPFL [44] , ... ). As these services are used very often, efficiency is a key issue when designing solutions to such agreement problems. Our main goal is to have an even better understanding of these problems while considering various levels of adversity (various failure models but also various computational models ranging from the purely synchronous one to the purely asynchronous one). Agreement protocols usually used in group communication only take into account an active adversary which may trigger either crashes (benign faults) or arbitrary behaviors (malign faults). Passive adversary which just observe the protocol's behavior have also to be considered in order to protect the privacy of each group's member.
From a software engineering point of view, the use of a componentware approach helps to implement the group concept in a modular way. Code tangling is a major concern when designing group communication services. Even if a consensus-based solution allows for a clean separation between agreement related code and protocol specific code, many concerns that crosscuts the various protocol codes remain. Part of the difficulty lies in the identification of all the hidden and tangled synchronizations which exist between the numerous protocols. We aim at promoting the "separation of concerns" principle to overtake existing toolkits in terms of adaptability. Indeed current existing solutions are poorly flexible and possible tunings usually require a deep expertise. To reach this objective, we need to revisit several protocols and their interaction schemes. Conducting a performance evaluation of our proposal is also a part of our activities.
Group Communication Services to Secure a Web Access
To address the confidentiality, integrity, and availability of an information system, a security policy has to be defined and enforced. In the case of large public applications deployed on Internet, prevention techniques are necessary but not sufficient. At runtime, additional mechanisms have to be used to detect any violation of the above properties and to limit their consequences. Indeed, some weaknesses and vulnerabilities often remain in the executed applications. The fact that these faults have never been identified and corrected before is partly due to both the ever increasing complexity of the information systems and the ever decreasing time-to-market of the new applications and services. Until they are discovered and eliminated, these design faults may prevent an application from behaving according to its specification. Hence, whether it may be accidental or intentional, when a user activates such a bug, security rules can be violated. Intentional faults are produced by malicious attackers who try to take advantage of residual vulnerabilities of the information system. Assuming that an intrusion can succeed, we want to be able to detect it, to confine damage and to clean and recover corrupted entities from errors.
Research on Intrusion Detection Systems (IDS) has been carried out following two distinct approaches: misuse detection and anomaly detection . Misuse detection, also called pattern-based detection, consists in recognizing attacks using a signature database which contains descriptions of already known attacks. The control focuses on the contents of the incoming requests. Unfortunatelly unknown attacks or variants of known ones may succeed. Once a vulnerability is discovered, new security advisories are published. Until they are taken into account, an information system remains unsecured. Even if the database is continuously refreshed and updated (which is a tremendous task in itself), this countermeasure fails to effectively react against an attack that may spread over Internet in just a few minutes, as some recent worms did. Anomaly detection, also called profile-based detection, consists in analyzing deviations from an expected normal behavior. The control focuses on the computing activities induced by the incoming requests. The accuracy of the detection relies on the two following assumptions. First, any intrusion should have a noticeable and unexpected impact on the activity of the system. This is almost always the case as an attack implies an abnormal use of the system. Second, a model that characterizes all the normal behavior patterns should be available. When both assumptions are satisfied, the occurrence of an intrusion tallies with the observation of a significant deviation from the expected behavior and vice versa. In this approach, new or unknown attacks can be detected. Of course, the definition of a model is far from being trivial. If the model is too general and permissive or too precise and restrictive, the IDS will probably make mistakes: it may for instance ignore real attacks (false negative ) or raise an alert although the suspicious request is actually not an attack (false positive ). The model representing normal behaviors is usually explicitly defined. In that case, after a static construction, it can be dynamically improved using normal training data sets during a preliminary learning phase.
In our work, we consider a radically different approach called implicit intrusion detection . In the case of a Web server that delivers dynamic contents, we show that the use of diversified COTS (Components-Off-The-Self) servers allows to detect intrusions. To secure a web access to a set of data, we assume that data is replicated and accessible through different systems that may have residual vulnerabilities but hopefully not necessarily the same ones. Consequently, an attack can succeed on a particular copy but not on all the redundant servers. By checking the values returned by the different copies to the malicious attacker we can identify differences and detect anomalies. Of course, the difficult part is to provide replication and detection mechanisms that are safe and will not become an even more simple target for the attacker. Our aim is to study how group services can be used and adapted to achieve this objective. This approach can detect even previously unknown attacks. Similar studies (leading however to different algorithmic solutions) have been conducted by the LAAS (Delta-4 [37] , [28] and DIT architectures [41] ) and by the university of Texas at Austin [47] .