Section: New Results
Participants : Diana Moise, Bogdan Nicolae, Gabriel Antoniu, Luc Bougé, Matthieu Dorier.
We focused on improving the MapReduce framework, by enhancing it with features provided by a data-storage service such as BlobSeer. We started by analyzing a key component of MapReduce frameworks, the storage layer. To enable massively parallel data processing over a large number of nodes, the storage layer must meet a series of specific requirements, that standard distributed file systems do not include in their design. Firstly, since data is stored in huge files, the computation has to process small parts of these huge files concurrently. Thus, the storage layer is expected to provide efficient fine-grain access to the files. Secondly, the storage layer must be able to sustain a high throughput in spite of heavy access concurrency to the same file, as thousands of clients may simultaneously access data. Versioning in this context becomes an important feature that is expected from the storage layer. Not only it enables rolling back undesired changes, but also branching a dataset into two independent datasets that can evolve independently. Finally, another important requirement for the storage layer is its ability to expose an interface that enables the application to be data-location aware . The scheduler uses this information to place computation tasks close to the data, thus reducing the network traffic, and contributing to a better global data throughput.
As BlobSeer already provides these features, the next step was to add a layer that enabled it to be used as a file system on top of BlobSeer. We called this additional layer, the BlobSeer File System (BSFS). This layer consists in a centralized namespace manager , which is responsible for maintaining a file system namespace, and for mapping files to BLOBs. We also implemented a caching mechanism for read/write operations, as MapReduce applications usually process data in small records (4 KB). This mechanism prefetches a whole block when the requested data is not already cached, and delays committing writes until a whole block has been filled in the cache. To make the MapReduce scheduler data-location aware, we extended BlobSeer with a new primitive, that exposes the block allocation to providers.
To evaluate the benefits of using BlobSeer as the storage backend for MapReduce applications we used Hadoop - Yahoo!'s implementation of the MapReduce framework. We substituted the original data storage layer of Hadoop (the Hadoop Distributed File System - HDFS with our BlobSeer-based file system - BSFS. To measure the impact of our approach, we performed experiments both with synthetic microbenchmarks and real MapReduce applications. The experiments were conducted on the Grid'5000 testbed, using up to 270 nodes. We focused on scenarios that exhibit highly concurrent accesses to shared files. The results showed that Hadoop significantly improved its sustained throughput by using BSFS instead of its default storage layer. These results will be presented at the 2010 IPDPS Conference  .