Overall Objectives
Scientific Foundations
Application Domains
New Results
Contracts and Grants with Industry
Other Grants and Activities

Section: New Results

P2P Data Management

Data management in P2P systems offers new research opportunities since traditional distributed database techniques need to scale up while supporting autonomy, heterogeneity, and dynamicity of the data sources. In the context of the Atlas Peer-to-Peer Architecture (APPA) project, the main results this year are in the management of replicated data, data privacy and testing.

Data Replication in DHTs

Participants : Reza Akbarinia, Esther Pacitti, Mounir Tlili, Patrick Valduriez.

Distributed Hash Tables (DHTs), e.g. CAN and Chord, provide an efficient solution for data location and lookup in large-scale P2P systems. One of the main characteristics of DHTs (and P2P systems) is the dynamic behavior of peers which can join and leave the system frequently, at anytime. When a peer gets offline, its data becomes unavailable. To improve data availability, most DHTs rely on data replication by storing (k, data) pairs at several peers, e.g. using several hash functions. If one peer is unavailable, its data can still be retrieved from the other peers that hold a replica. However, update management is difficult because of the dynamic behavior of peers and concurrent updates, and missed updates. One approach for update management is to stamp the updates with monotonically increasing timestamps, and send the updates and their timestamps to the replica holders. This gives us a total order on updates. To deal with missed updates, we can use timestamps which are continuous, i.e. without gap, so that a replica holder can detect the existence of missed updates by looking at the timestamps of the updates it has received. Examples of applications that can take advantage of continuous timestamping are the P2P collaborative text editing applications, e.g. P2P Wiki, which need to reconcile the updates done by collaborating users. However, the problem is how to generate such timestamps in a DHT.

In a recently submitted paper, we gave an efficient solution to this problem. We proposed a service called Continuous Timestamp based Replication Management (CTRM) that deals with the efficient storage, retrieval and updating of replicas in DHTs. To perform updates on replicas, we developed a new protocol for CTRM that stamps the updates with timestamps in a distributed fashion. One of the main features of our protocol is that the updates' timestamps are not only monotonically increasing but also continuous. We take into account the peer failures that may happen during execution of the protocol, and show that our protocol works correctly under these failures.

The CTRM service is inspired by the P2P-LTR service (P2P Logging and Timestamping for Reconciliation) which we proposed in 2008. The objective of P2P-LTR is to perform distributed reconciliation over DHTs. It extends the Key-based Timestamping Service proposed in 2007 to support decentralized timestamping. While updating at collaborating peers, updates are timestamped and stored over a set of peers which are chosen using a set of hash functions. During reconciliation, these updates are retrieved in total order to enforce eventual consistency despite churn and failures. P2P-LTR was proposed in the context of XWiki Concerto and Grid4All projects. We finished the implementation of P2P-LTR this year.

Data Privacy

Participants : Mohamed Jawad, Patrick Valduriez.

Online peer-to-peer (P2P) communities such as professional communities (e.g., medical or research communities) are becoming popular due to increasing needs on data sharing. P2P environments offer valuable characteristics but limited guarantees when sharing sensitive or confidential data. They can be considered as hostile because data can be accessed by everyone (by potentially untrusted peers) and used for everything (e.g., for marketing or for activities against the owner's preferences or ethics).

Hippocratic databases provide mechanisms for enforcing purpose-based disclosure control, within a centralized datastore. This is achieved by using privacy metadata, i.e. privacy policies and privacy authorizations stored in tables. A privacy policy defines for each attribute, tuple or table the usage purpose(s), the potential users and retention period while privacy authorization defines which purposes each user is authorized to use. In the context of P2P systems, decentralized control makes it hard to enforce purpose-based privacy.

In addition to purpose-based data privacy, to prevent data misuse, it is necessary to trust participants. Trust management systems deal with unknown participants by testing their reputation. Reputation techniques verify the trustworthiness of peers by assigning them trust levels . A trust level is an assessment of the probability that a peer will not cheat.

In the context of P2P systems, few solutions for data privacy have been proposed and they focus on a small part of the general problem of data privacy, e.g. anonymity of uploaders/downloaders, linkability (correlation between uploaders and downloaders), content deniability , data encryption and authenticity. However, the major problem of data privacy violation due to data disclosure to malicious peers which misuse data, is not addressed.

In [35] , we proposed a P2P data privacy model which combines the Hippocratic principles and trust. We proposed the algorithms of PriServ, a DHT-based P2P privacy service which supports this model and prevents data privacy violation. We also proposed three algorithms for trust level searching in PriServ. Our performance evaluation shows that PriServ introduces a small overhead.

In [34] , [33] , we extended PriServ functionalities. To improve availability, we give owners the choice to store locally their data or to distribute them on the system. Because distribution depends on the DHT, owners could see their private data stored on untrusted peers. To overcome this problem, before distribution, data is encrypted and decryption keys are stored and duplicated by owners. We also proposed a component-based architecture for PriServ. Several simulation results encourage our ideas and a prototype of PriServ is under development and will be tested on the Grid5000 platform.

Testing P2P Systems

Participants : Eduardo Almadeia, Gerson Sunyé, Patrick Valduriez.

Traditional architectures for testing, based on the Conformance Testing Methodology and Framework (CTMF) are not fully adapted to test large-scale distributed applications. Indeed, in this architecture, each node is tested by a Lower Tester (LT) and LTs are controlled by a centralized unit, the Lower Tester Control Function (LTCF). The LTCF establishes the synchronization among Lower Testers, ensuring for instance that a retrieve operation will only be executed after the insertion of data. Since the LTCF is centralized, it is a bottleneck when the number of nodes scales up. To address this problem, we proposed a distributed architecture for testing large-scale distributed data intensive applications.

This architecture was implemented in the PeerUnit software prototype. PeerUnit is based on the CTMF and implements both, a centralized and a distributed architecture to control Lower Testers [20] , [14] . The distributed architecture showed a satisfactory performance when controlling more than a thousand nodes. We proposed an incremental methodology to deal with three aspects of P2P testing (functionality, volatility and scalability). The idea is to cover functionality first on a small system and then incrementally address the scalability and volatility aspects. The methodology was validated through the test of two popular open-source P2P systems, FreePastry and OpenChord.


Logo Inria