Section: Software
Marcel
Participants : Olivier Aumage, Ludovic Courtès, Nathalie Furmento, Samuel Thibault.
Marcel is the thread library of the PM2 software suite. Marcel features a two-level thread scheduler (also called N:M scheduler) that schedules user-level threads on top of a set of kernel threads (usually one kernel thread per logical processor). Such a model achieves the performance of a user-level thread package while being able to exploit multiprocessor machines. The architecture of Marcel was carefully designed to support a large number of threads and to efficiently exploit hierarchical architectures (e.g. multicore chips, NUMA machines).
At the core of the Marcel architecture, we find the BubbleSched scheduling framework (http://runtime.bordeaux.inria.fr/bubblesched/ ). Computing platforms are becoming increasingly hierarchical. As an answer to this trend, BubbleSched provides the application programmer with high level constructs, called bubbles , to let him express the affinity between the various activities of his application: the application describes affinities between the threads it launches by encapsulating them into nested bubbles (threads which work on the same data for instance), which thus form a tree of the hierarchical activity structure of the application. Thanks to the hwloc hardware discovery library (see Section 5.2 ), BubbleSched then provides the scheduler programmer with a hierarchical runqueues tree that represents the detected hierarchical platform and a toolkit of basic operations to dynamically map the hierarchical activity structure (that is, the threads) of the application (as modeled by the bubbles) onto the hierarchical platform in a suitably tailored scheduling [12] . That permits to benefit from cache effects as much as possible, or favor bandwidth, or favor load balancing, or whatever fits the application best. Various scheduling algorithms are provided and can be combined to meet various application needs (e.g. cache affinity vs memory affinity [26] )
Marcel has a dedicated module to handle memory called MaMI . It allows developers to manage memory with regard to NUMA nodes. Aside from usual memory allocation policies such as binding or interleaving, it also offers two memory migration strategies. The first method is synchronous and allows to move data on a given node on application's demand. The second method is based on a Next-Touch policy. MaMI also provides the application with hints about the actual cost of reading, writing, or migrating distant memory buffers. Moreover, MaMI gathers statistics on how much memory is available and left on the different nodes.
A trace of the scheduling events can be recorded and used after execution for generating an animated movie showing a replay of the execution: how bubbles and threads were created, how they got distributed over the machine, how they eventually got scheduled on processors, etc. End users may hence easily try and tune various bubble schedulers for their applications, and select the most suited one.
At very fine grains of parallelism, the inherent cost of parallelism management per grain is not negligible anymore when compared to the grains themselves. Marcel thus provides a seed construct which can be seen as a precursor of thread. Creating a thread seed does not reserve any resource except from the information about the task to be run. It is only when the time comes to actually run the seed that Marcel attempts to reuse the resources and the context of another, dying thread, significantly saving management costs when succeeding while keeping the penalty low otherwise.
In addition to a set of original extensions, Marcel provides a POSIX-compliant interface which thus permits to take advantage of it by just recompiling unmodified applications or parallel programming environments (API compatibility), or even by running already-compiled binaries with the Linux NPTL ABI compatibility layer.
Marcel consists in 83 000 lines of code. This library is developed and maintained by Samuel Thibault and Olivier Aumage . The software is freely available under the terms of the GNU General Public License version 2 at the following URL: http://runtime.bordeaux.inria.fr/marcel/ .