Section: New Results
Cloud, cluster and grid computing
Participants : Hien N'Guyen Van, Fabien Hermenier, Adrien Lèbre, Jean-Marc Menaud.
Large scale distributed system like grids or clusters have become increasingly popular in both academic and industrial contexts. The new cloud computing architecture approach, where computing resources are provisioned on a per-demand basis, notably to handle peak loads, instead of being statically allocated, should reinforce this trend. In this section, we present problems that we have addressed on the management of computing resources (Entropy) and data resources (kDFS) in large scale and high performant distributed systems.
Virtualization technologies have recently gained a lot of interest in grid computing as they allow flexible resource management. However, the most common way to exploit grids relies on dedicated services like resource management systems (RMSs) with a static allocation of resources for a bounded amount of time. Those approaches are known to be insufficient for a high utilization of clusters or grids. To provide a finer RMS, job preemption, migration and dynamic allocation of resources are required. However, due to the development complexity of such mechanisms, advanced scheduling strategies have been rarely used in available systems.
This year, we have continued to analyze and experiment how latest VM capabilities can improve job management.
The main activities have been conducted around the Entropy framework  . By encapsulating each component of a job into its own VM (vjobs ), we have extended the former proposal to combine live migration and suspend/resume capabilities: the live migration aims at adapting the assignments of VMs according to their current requirements while the suspend/resume operations provide preemption capability. Thus one can implement fine-grained scheduling policies by applying a cluster-wide context switch through the manipulation of the VMs. Thanks to this new software abstraction, developers can implement sophisticated algorithms to schedule jobs without handling the issues related to the manipulation of the VMs. They can only focus on the implementation of their algorithm to select the jobs to run while the cluster-wide context switch system performs the necessary actions to switch between VM configurations. The Entropy system has been partially redesigned to handle the cluster-wide context switch in a generic way  ,  ,  .
Moreover, Entropy has been extended to decouple the provisioning of resources from the dynamic placement of virtual machines  . This resources manager aims to optimize a global utility function which integrates both the degree of SLA fulfilment and the operating costs. Results obtained through simulations validate our approach  .
In cooperation with the Paris project-team from INRIA Rennes-Bretagne Atlantique, we have addressed two resource-management issues using virtualization.
First, we have addressed the best-effort issue in grids. To improve resource usage, most of resource management systems for grids provide a best-effort mode where lowest priority jobs can be executed when resources are idle. This particular mode does not provide any service guarantees and jobs may be killed at any time by the RMS when the nodes they use are subject to higher priority reservations. This behavior potentially leads to a huge waste of computation time or at least requires users to deal with checkpoints of their jobs. To tackle this issue, we suggested the Saline proposal, a generic and non-intrusive framework to manage best-effort jobs at the grid level through virtual machines (VMs) usage  . In Saline, each best-effort job is transparently submitted into VMs so that the computation can be relocated in another location in the grid each time the resources have been taken away by higher priority jobs. Such an approach results in better performance concerning the total execution time of best-effort requests and a large benefit according to software developments (it relieves users of the burden of implementing a specific checkpointing framework for each best-effort program).
Second, we have worked on the clarification of the different ”virtualization” solutions that are available nowadays (each providing particular functionalities). Goldberg proposed to classify virtualization techniques in two models (Type-I and Type-II), which do not allow for the classification of recent virtualization technologies. We have proposed an extension of the Goldberg model to take into account recent ”virtualization” mechanisms  . This proposal enables the formalization of the following terms: virtualization , emulation , abstraction , partitioning , and identity . We show that a single virtualization solution is generally composed of several layers of virtualization capabilities, depending on the granularity of the analysis. In this manner, the suggested model allows us to classify virtualization technologies according to their performance, similarity and portability.
kDFS: toward an integrated cluster file system
kDFS aims at providing an integrated cluster file system for High Performance Computing. kDFS is a distributed file system pluggable under the VFS and only based on the KDDM component of Kerrighed Single System Image (SSI). The KDDM features are used to build a cooperative cache for both data and meta-data using all available memory in the cluster. The innovating approach concerns the design and the implementation of this symmetric file system with regard to the other mechanisms available: most of the cluster management systems are designed independently without considering the benefits of strong cooperations between each service. In this project, in addition to provide the common functionalities of a distributed file system, we analyze how kDFS could exploit, cooperate with and complete the cluster services itself to improve usage and global performance.
In the context of Pierre Riteau's Master internship achieved in 2009, we have focused on reliable execution of applications that use file systems for data storage in a distributed environment. An efficient and portable file versioning framework was designed and implemented in the distributed file system kDFS. This framework can be used to snapshot file data when a process' volatile state is checkpointed and thereby makes it possible to restart a process using files in a coherent way. A replication model synchronized with the checkpoint mechanisms has also been proposed. It provides stable storage in a distributed architecture. The synchronization has enabled us to reduce network and disk I/O compared to a synchronous replication mechanism like RAID1  .
We are currently implementing a data striping policy relying on application access patterns and thus avoiding RAID alignment issues. These activities are also done conjointly with the Paris project-team. Further details are available at http://www.kerrighed.org/wiki/index.php/KernelDevelKdFS .