Overall Objectives
Scientific Foundations
Application Domains
New Results
Contracts and Grants with Industry
Other Grants and Activities

Section: Scientific Foundations

Distributed data management

Past research on distributed data management led to three main approaches. Currently, the most widely-used approach to data management for distributed grid computation relies on explicit data transfers between clients and computing servers. As an example, the Globus  [68] platform provides data access mechanisms (like data catalogs) based on the GridFTP protocol. Other explicit approaches (e.g., IBP ) provide a large-scale data storage system, consisting of a set of buffers distributed over Internet. The user can ``rent'' these storage areas for efficient data transfers.

In contrast, Distributed Shared Memory (DSM) systems provide transparent data sharing, via a virtual, unique address space accessible to physically distributed machines. It is the responsibility of the DSM system to localize, transfer, replicate data, and guarantee their consistency according to some semantics. Within this context, a variety of consistency models and protocols have been defined. Nevertheless, existing DSM systems have generally shown satisfactory efficiency only on small-scale configurations, up to a few tens of nodes.

Recently, peer-to-peer (P2P) has proven to be an efficient approach for large-scale resource (data or computing resources) sharing [90] . The peer-to-peer communication model relies on a symmetric relationship between peers which may act both as clients and servers. Such systems have proven able to manage very large and dynamic configurations (millions of peers). However, several challenges remain. More specifically, as far as data sharing is concerned, most P2P systems focus on sharing read-only data, that do not require data consistency management. Some approaches, like OceanStore and Ivy , deal with mutable data in a P2P with restricted use. Today, one major challenge in the context of large-scale, distributed data management is to define appropriate models and protocols allowing to guarantee both consistency of replicated data and fault tolerance , in large-scale, dynamic environments .


Logo Inria