Section: Overall Objectives
Overall Objectives
Our research activities address the area of distributed data management at challenging scales, on grids, clouds, petascale architectures, desktop grids, etc. We target data-oriented high-performance applications that exhibit the need to handle massive non structured data - BLOBs: binary large objects (in the order of terabytes) - stored in a large number of nodes (thousands to tens of thousands), accessed under heavy concurrency by a large number of clients (thousands to tens of thousands at a time) with a relatively fine access grain (in the order of megabytes). Examples of such applications are:
-
Grid and cloud data-mining applications handling massive data distributed at a large scale.
-
Advanced data storage and management on cloud infrastructures.
-
Distributed storage for Petaflop computing applications.
-
Data storage for desktop grid applications with high write throughput requirements.
-
Distributed data sharing and storage for extremely large databases.
Our current research follows three main research directions.
Multiversion BLOB management
We are currently designing, implementing and experimentally validating a generic data management platform for large-scale distributed infrastructures, called BlobSeer (http://blobseer.gforge.inria.fr/ ). It is aimed at addressing the challenges mentioned above: huge data, highly concurrent fine-grain access, while supporting versioning and decentralized metadata management.
Scalable BLOB-based distributed file systems
We are exploring how the file system approach can support scalable data management to address the needs of two classes of applications:
-
data-mining through massive data using the Map-Reduce paradigm;
-
numerical applications for Petaflop architectures.
The goal is to evaluate the benefits of building global file systems using object-based distributed storage as proposed by BlobSeer, which targets efficient, decentralized management of huge data under heavy concurrency.
Monitoring, fault-tolerance and self-steering
We aim at proposing a self-adaptive BLOB management system. To this aim, we are equipping BlobSeer with a number of software sensors to couple it with the MonALISA generic monitoring system (http://monalisa.caltech.edu/ ). This latter system offers a distributed, modular architecture which can adapt the very large scale of BlobSeer and the high rate of client interactions. MonALISA allows the user to visualize a large number of behavioral parameters in a convenient way. Moreover, it is possible to build a feedback loop from MonALISA to BlobSeer so that BlobSeer can dynamically reconfigure (e.g., in reaction to failures or to a dynamic variation of resource availability), according to the observation of its global behavior by MonALISA. This is the path toward a self-steering BlobSeer.