Section: New Results
Structuring of Applications for Scalability
Large Scale Experiments
The merge of the two libraries ParCeL and SSCRAP into the new software suite parXXL has been accomplished this year. It consists of different toolboxes, par::cpp (interfaces for the C++ language), par::sys (interfacing POSIX systems), par::mem (tools for managing memory), par::step (manage supersteps), par::cell (management of cellular networks) and par::cellnet (defining default network types).
The integration of the formerly seperated libraries allows to validate the whole on a wide range of fine grained applications and problems. A report on the design and the first benchmarking of the integrated code can be found in  .
Now that the communication layer of parXXL can handle large numbers of POSIX threads (shared memory) or distributed processes (MPI), we were able to run large scale experiments on mainframes and clusters. These have proved the scalability of our approach as a whole, including engineering, modeling and algorithmic aspects: the algorithms that are implemented and tested show a speedup that is very close to the best possible theoretical values, and these speedups are reproducible on a large variety of platforms, see  .
Models and Algorithms for Coarse Grained Computation
We continued the design of algorithms in the coarse grained setting as given by the model PRO  . In particular we aimed for the design of an algorithm that takes advantage of the structure commonly encountered with massive graphs, namely the fact that they usually have a bounded arboricity. There we gave algorithms for computing probability vectors that can be used for the clustering of communities, see  .
For testing and benchmarking the generation of large random input data with known probability distributions is crucial. In  , we show how to uniformly distribute data at random in two related settings: coarse grained parallelism and external memory . In contrast to previously known work for parallel setups, our method is able to fulfill the three criteria of uniformity, work-optimality and balance among the processors simultaneously. To guarantee the uniformity we investigate the matrix of communication requests between the processors. We show that its distribution is a generalization of the multivariate hypergeometric distribution and we give algorithms to sample it efficiently in the two settings.
Overlapping Computations and Communications with I/O
In  , we noticed that the performance of our pipeline algorithm were impacted by asynchronous communications that introduced gaps between I/O operations. To address this issue we studied how to adapt this kind of algorithms, that is wavefront algorithm, to shared memory platforms.
Using the parXXL library we were able to propose a architecture-independant out-of-core implementation of a well known hydrodynamics kernel, see  . In addition to this implementation we proposed a optimized data layout that allow to reduce the I/O impact on each iteration of the algorithm by an order of magnitude at the cost of an initial rewriting of the data. This work is currently under submission, see  .