Overall Objectives
View by sections

Application Domains
Contracts and Grants with Industry
Other Grants and Activities
Inria / Raweb 2003
Project: PARIS

Project : paris

Section: Scientific Foundations

Distributed data management

Past research on distributed data management led to 3 main approaches. Currently, the most widely-used approach to data management for distributed grid computation relies on explicit data transfers between clients and computing servers. As an example, the Globus [69] platform provides data access mechanisms (Globus Access to Secondary Storage), based on the GridFTP protocol. Other explicit approaches (e.g., IBP) provide a large-scale data storage system, consisting of a set of buffers distributed over Internet. The user can ``rent'' these storage areas and use them as temporary buffers for efficient data transfers across a wide-area network. Transfer management is still at the user's charge. Besides, IBP does not handle dynamic join/departure of storage nodes and provides no consistency guarantee for multiple copies of the same data.

In contrast, Distributed Shared Memory (DSM) systems provide transparent data sharing, via a unique address space accessible to physically distributed machines. Within this context, a variety of consistency models and protocols have been defined. These systems do offer transparent access to data: all nodes can read and write data in a uniform way, using a unique identifier or a virtual address. It is the responsibility of the DSM system to localize, transfer, replicate data, and guarantee their consistency according to some semantics. Nevertheless, existing DSM systems have generally shown satisfactory efficiency only on small-scale configurations, typically, a few tens of nodes.

Recently, peer-to-peer (P2P) has proven to be an efficient approach for large-scale data sharing [85]. The peer-to-peer model is complementary to the client-server model: the relations between machines are symmetrical, each node can be client in a transaction and server in another. This paradigm has been made popular by Napster, Gnutella and KaZaA. Such systems have proven able to manage very large configurations (millions of nodes) with a very high volatility. However, we can note that most P2P systems focus on sharing immutable files: the shared data are read-only and can be replicated at ease. Recently, some mechanisms for sharing mutable data in a P2P environment have been proposed by systems like OceanStore and Ivy, with restricted use (no multiple writers nor conflict resolution).