Section: Scientific Foundations
Towards scalable, BLOB-based distributed file systems
Recent research [40] emphasizes a clear move currently in progress from a block-based interface to a object-based interface in storage architectures, with the goal of enabling scalable, self-managed storage networks. It is done by moving low-level functionalities such as space management to storage devices or to storage server, accessed through a standard object interface. This move has a direct impact on the design of today's distributed file systems: object-based file system would then store data rather as objects than as unstructured data blocks. According to [40] , this move may eliminate nearly 90% of management workload which was the major obstacle limiting file systems' scalability and performance.
Two approaches exploit this idea. In the first approach, the data objects are stored and manipulated directly by a new type of storage device called object-based storage device (OSD). This approach requires an evolution of the hardware, in order to allow high-level object operations to be delegated to the storage device. The standard OSD interface was defined in the Storage Networking Industry Association (SNIA) OSD working group. The protocol is embodied over SCSI and defines a new set of SCSI commands. Recently, a second generation of the command set, Object-Based Storage Devices - 2 (OSD-2) has been defined. The distributed file systems taking the OSD approach assume the presence of such an OSD in the near future and currently rely on a software module simulating its behavior. Examples of parallel/distributed file systems following this approach are Lustre [57] and Ceph [62] . Recently, research efforts [38] have explored the feasibility and the possible benefits of integrating OSDs into parallel file systems, such as PVFS [34] .
The second approach does not rely on the presence of OSDs, but still tries to benefit from an object-based approach to improve performance and scalability: files are structured as a set of objects that are stored on storage servers. Google File System [41] , and HDFS (Hadoop File System [23] ) illustrate this approach.