Section: New Results
Introspective BlobSeer
Participants : Alexandra Carpen-Amarie, Jing (Tylor) Cai, Alexandru Costan, Gabriel Antoniu, Luc Bougé.
The cloud computing model is an emerging paradigm for dynamically provisioning processing time and storage space from a cloud of computational resources. The most important layer in the cloud-computing stack is the Infrastructure-as-a-Service (IaaS), which provides fully-configurable virtual machines or virtual storage. In the context of the emerging cloud infrastructures, one of the most critical challenges concerns data management. Our work focuses on building an autonomic, efficient and secure storage service for IaaS clouds, designed to leverage the needs of data-intensive distributed applications by leveraging BlobSeer, the large-scale distributed data-sharing platform developed in our team.
The first step towards an autonomic data-sharing system was to equip the BlobSeer platform with introspection capabilities. This feature plays a crucial role in helping the users to overcome the issues raised by managing the behavior of their systems at large scales. Our work addressed the challenges raised by the introduction of introspection into such a data-management system. These challenges come from the fact that introspection is often limited to low-level tools for monitoring the physical nodes, whereas enabling an autonomic behavior for our system requires the analysis of both general and specific data-storage parameters, such as physical data distribution or data access patterns.
We proposed a 3-layered architecture [13] built on top of BlobSeer: 1) an instrumentation layer that extracts the low-level, raw data from the different components of BlobSeer; 2) a monitoring layer that deals with collecting and storing the monitoring data from the instrumentation layer; and 3) an introspective layer that processes the gathered data into higher-level information describing the state and the behavior of the system. The data extracted by the introspective layer that can be further fed to a self-adaptive engine, able to improve the performance and to optimize the resource usage in BlobSeer.
The monitoring layer was implemented as an extension [11] of a general-purpose, large-scale monitoring framework, called MonALISA. The proposed architecture was evaluated on the Grid'5000 testbed, using more than 100 nodes for the experiments. The performed experiments confirm the outcome of the introspection layer, by means of graphical representations associated with the various high-level data extracted [12] .
We are now investigating several directions that will lead to the integration of the BlobSeer platform within an IaaS cloud, as a storage service. One direction, which builds upon the instrumentation capabilities developed so far, is the design and integration of the self-adaptation layer which will enable an autonomic behavior.
The second direction is related to security issues raised by the design of BlobSeer, which need to be addressed when exposing BlobSeer as a service for sharing data belonging to different users. We focused on the detection of the illegal actions performed by malicious clients, by relying on the previously-designed introspection architecture. The same framework can be further extended to enforce restrictions on the client actions, in order to cope with clients breaking access policies or with abnormal client activity. Another important direction we are currently exploring is integrating BlobSeer with an existing cloud infrastructure, such as the Nimbus cloud environment from Argonne National Lab.