Team KerData

Overall Objectives
Scientific Foundations
Application Domains
New Results
Other Grants and Activities

Section: Scientific Foundations

Transparent, distributed data sharing

The management of massive data blocks naturally requires the use of data fragmentation and of distributed storage. Grid infrastructures, typically built by aggregating distributed resources that may belong to different administration domains, were built during the last years with the goal of providing an appropriate solution. When considering the existing approaches to grid data management, we can notice that most of them heavily rely on explicit data localization and on explicit transfers of large amounts of data across the distributed architecture: GridFTP  [28] , Reptor  [48] , Optor  [48] , LDR  [18] , Chirp  [17] , IBP  [31] , NeST  [32] , etc. Managing huge amounts of data in such an explicit way at a very large scale makes the design of grid application much more complex. One key issue to be addressed is therefore the transparency with respect to data localization and data movements. Such a transparency is highly suitable, as it liberates the user from the need to handle data localization and transfers.

Some approaches to grid data management already acknowledge that providing a transparent data access model is important. They integrate this idea at the early stages of their design. Grid file systems , for instance, provide a familiar, file-oriented API allowing to transparently access physically distributed data through globally unique, logical file paths. The applications simply open and access such files as if they were stored on a local file system. A very large distributed storage space is thus made available to those existing applications that usually use file storage, with no need for modifications. This approach has been taken by a few projects like GFarm  [59] , GridNFS  [43] , LegionFS  [63] , etc.

On the other hand, the transparent data access model is equally defended by the concept of grid data-sharing service   [29] , illustrated by the JuxMem platform  [30] . Such a service provides the grid applications with the abstraction of a globally shared memory, in which data can be easily stored and accessed through global identifiers. To meet this goal, the design of JuxMem leverages the strengths of several building blocks: consistency protocols inspired by Distributed Shared Memory (DSM) systems; algorithms for fault-tolerant distributed systems; protocols for scalability and volatility support from peer-to-peer (P2P) systems. Note that such a system is fundamentally different from traditional DSM systems (such as TreadMarks, etc.). First, it targets a much larger scale through hierarchical consistency protocols suitable for an efficient exploitation of grids made of a federation of clusters. Second, it addresses from the very beginning the problem of resource volatility due to failures or to the lack of resource availability. Compared to the grid file system approach, this approach improves access efficiency by totally relying on main memory storage. Besides the fact that a main memory access is more efficient than a disk access, the system can leverage locality-optimization schemes developed for the DSM consistency protocols.


Logo Inria