Keywords : Cluster operating system, single system image (SSI), distributed shared memory (DSM), process migration, global scheduling, distributed file system, co-operative caching, high availability, checkpointing.
Christine Morin, Christine.Morin@irisa.fr
Registered at APP, under Reference IDDN.FR.001.480003.006.S.A.2000.000.10600 .
GNU General Public License version 2. Kerrighed is a registered trademark.
Kerrighed is a Single System Image (SSI) operating system for high-performance computing on clusters. It provides the user with the illusion that a cluster is a virtual SMP machine.
In Kerrighed , all resources (processes, memory segments, files, data streams) are globally and dynamically managed to achieve the SSI properties. Global resource management makes distribution of resources transparent throughout the cluster nodes, and allows to take advantage of the whole cluster hardware resources for demanding applications. Dynamic resource management enables transparent cluster reconfigurations (node addition or eviction) for the applications, and high availability in the event of node failures. In addition, a checkpointing mechanism is provided by Kerrighed to avoid restarting applications from the beginning when some node failure occurs.
Kerrighed preserves the interface of a standard, single-node operating system, which is familiar to programmers. Legacy sequential or parallel applications running on this standard operating system can be executed without modification on top of Kerrighed , and further optimized if needed.
Kerrighed is not an entirely new operating system developed from scratch. Just in the opposite, it has been designed and implemented as an extension to an existing standard operating system. Kerrighed only addresses the distributed nature of the cluster, while the native operating system running on each node remains responsible for the management of local physical resources. Our current prototype is based on Linux , which is extended using the standard module mechanism. The Linux kernel itself has only been slightly modified.
A public mailing list (email@example.com ) and a technical forum are available to provide a support to Kerrighed users.
- Current status:
Kerrighed (version V1.0.2) includes 70,000 lines of code (mostly in C). It involved more than 200 persons-months. It provides a customizable, cluster-wide process scheduler, a cluster-wide Unix process interface, high-performance stream migration allowing migration of MPI processes, process checkpointing, and an efficient distributed file system. It also offers a complete Pthread support, allowing to execute legacy OpenMP and multithreaded applications on a cluster without any recompilation. Kerrighed SSI features are customizable.
A live-CD of Kerrighed based on Knoppix is also available. It eases Kerrighed installation for demonstrations or for evaluation, by users not familiar with Linux installation process.
In 2006, Kerrighed has been ported to Linux 2.6.11. The code has significantly been improved during this port resulting in a more compact software. Moreover, Kerrighed is also distributed as an official spin-off OSCAR package with the SSI-OSCAR package. Since November 2006, SSI-OSCAR packages based on the development version of Kerrighed and OSCAR 5.0, are available for Linux distributions supported by OSCAR (e.g., Fedora Core 5, RedHat Enterprise Linux 4, etc.). A port to the Debian Linux distribution has also been carried out.
Demonstrations of Kerrighed have been presented in 2006 at Linux Expo (Paris, February 2006, Pascal Gallard, Renaud Lottiaux, Jean Parpaillon and Christine Morin), and Supercomputing 2006 Conference (Tampa, Florida, November 2006, Jean Parpaillon). Kerrighed has also been presented by Jean Parpaillon at the Paris Capitale du Libre event in Paris, in June 2006.