Project : paris
Keywords : Cluster operating system , single system image , global and dynamic resource management , distributed shared memory , process migration , global scheduling , cooperative caching , remote paging , high availability , backward error recovery .
Registered at APP, under Ref. IDDN.FR.001.480003.005.S.A.2000.000.10600
GNU General Public License version 2. Kerrighed is a registered trademark.
Kerrighed (formerly known as Gobelins) is a Single System Image (SSI) operating system for high-performance computing on clusters. It provides the user with the illusion that a cluster is a virtual SMP machine.
In Kerrighed, all resources (processes, memory segments, files, data streams) are globally and dynamically managed to achieve all the SSI properties. Global resource management enables transparent distribution of resources throughout the cluster nodes and take advantage of the whole cluster hardware resources for demanding applications. Dynamic resource management enables transparent cluster reconfigurations (node addition or eviction) for the applications and high availability in the event of node failures. In addition, a checkpointing mechanism is provided by Kerrighed to avoid to have to restart applications from the beginning when node failure happens.
To avoid mechanism redundancy and conflicting decisions in different distributed resource management services and to decrease the software complexity of such services, Kerrighed resource management services are built in an unified and integrated way.
Kerrighed preserves the interface of a standard single node operating system, which is familiar to programmers. Legacy sequential or parallel applications running on this standard operating system may be executed without modification on top of Kerrighed and further optimized if needed.
Kerrighed is not an entirely new operating system developed from scratch. In the opposite, it has been designed and implemented as an extension to an existing standard operating system. Kerrighed only addresses the distributed nature of the cluster, while the native operating system running on each node remains responsible of the management of local physical resources. Our current prototype is based on Linux, which is extended using the standard module mechanism. The Linux kernel itself has only been slightly modified.
A public mailing list (email@example.com) is available to support users of Kerrighed.
- Current status:
Kerrighed includes 80,000 lines of code (mostly in C). It represents 140 person-months of effort. The development of Kerrighed started in late 1999. The stable release of Kerrighed is Version V0.72  (October 2003). It provides a complete Pthread support, allowing to execute legacy OpenMP and multithreaded applications on a cluster without any recompilation.
Kerrighed currently includes 6 Linux modules and a limited patch to Linux Kernel 2.2.13. A port to Linux Kernel 2.4.20 is currently in progress, and a new version of Kerrighed will be released in early December.
Several demonstrations of Kerrighed have been presented this year at Linux Expo (Paris, February 2003, Louis Rilling and Christine Morin), the IPDPS Conference (Nice, April 2003, Geoffroy Vallée and Louis Rilling), Edf R&D Printemps de la recherche (Clamart, May 2003, Renaud Lottiaux) and Euro-Par Conference (Klagenfurt, Austria, August 2003, Pascal Gallard and Gaël Utard) . Kerrighed is currently experimented by Cap Gemini Ernst and Young, ONERA CERNT, Dga CELAR in the framework of COCA contract, as well as by Edf . More than 100 external downloads of Kerrighed have been recorded since November 2002.