Team moais

Overall Objectives
Scientific Foundations
Application Domains
New Results
Contracts and Grants with Industry
Other Grants and Activities

Section: New Results

Parallel algorithms, complexity and scheduling


The work on scheduling mainly concerns multi-objective optimization and jobs scheduling on resources grid. We have exhibited techniques to find good trade-off between criteria. Mainly we achieved two main results. First, we characterized a multi-user problem with an algorithm achieving a constant approximation to the pareto curve. This part is related to our previous works related to results of the game theory. Secondly, we extended the spectrum of multi-criteria results to include either numerous criteria or new and radically different criteria like reliability or memory consumption versus execution time.

Regarding work-stealing and greedy scheduling, new analysis are developed that enables the development of several variants of work-stealing that achieve provable performances. An important motivation was to relax the classical restrictions. Yet we have considered both following extensions: tasks with bounded fan-out; considering push greedy scheduling strategy coupled with standard work-stealing (poll); taking into account other criteria then depth (eg critical height and locality).

Adaptive algorithm

New results, both theoretical and experimental, have been obtained with respect to the bi-criteria work/depth threshold in order to reach asymptotic optimal running time on distributed architectures with processors of heterogeneous frequencies.

Provable work-optimal parallelizations of STL (Standard Template Library) algorithms based on the work-stealing technique has been achieved. Build on top of Kaapi work-stealing, the KaSTL library provides adaptive implementations of the STL C++ library. Unlike previous approaches where a deque for each processor is typically used to locally store ready tasks and where a processor that runs out of work steals a ready task from the deque of a randomly selected processor, overhead for task creations is reduced based on an original implementation of work-stealing without using any deque but a distributed list.

Based on our results on the adaptive coupling of GPU and CPU parallelism for interactive 3D modeling, another perspective is to take benefit of Lastly to provide fine grain adaptive parallelization of a part of the SOFA library.

Safe distributed computation

In the fail-silent model, an efficient coordinated checkpoint mechanism of the dataflow that described the computation at coarse-grain has been developed and integrated into Kaapi (Xavier Besseron PhD Thesis). It extends the IEEE TDSC paper for iterative applications to take into account the knowledge of the dependencies among processors to speedup restart time after a failure.

With respect to malicious faults (byzantine errors), a probabilistic certification platform has been designed that includes hardware crypto-processors. This work has been performed within the ANR SAFESCALE project. Using a macro-data flow representation of the program execution, a complementary work, jointly developed with Paris team-project, is based on work-stealing scheduling to dynamically adapt the execution to sabotage while keeping a reasonable slowdown rate. Unlike static adaptation or adaptation at the source code level, a dynamic adaptation at the middleware level is proposed, enforcing separation of concepts and programming transparency. We are still extending algorithm-based fault-tolerance schemes for probabilistic certification in a more general context. We have developed a byzantine fault-tolerant interpolation algorithm suited to integral or polynomial modular computations. A first prototype has been developed and evaluated for the computation of the determinant of a matrix with integral coefficients distributed over GRID'5000 in order to simulate a global computing environment.

Finally, considering cryptographic primitives, we have proposed a new way to bound the probability of occurrence of an n -round differential in the context of differential cryptanalysis. Hence this new model allows us to claim proof of resistance against impossible differential cryptanalysis, as initially defined by Bi- ham in 1999. This work is applied to CS-Cipher, to which, assuming some non-trivial hypothesis, provable security against impossible differential cryptanalysis is obtained.

Cache-Oblivious Mesh Layout

One important bottleneck when visualizing large data sets is the data transfer between processor and memory. Cache-aware (CA) and cache-oblivious (CO) algorithms take into consideration the memory hierarchy to design cache efficient algorithms. CO approaches have the advantage to adapt to unknown and varying memory hierarchies. Recent CA and CO algorithms developed for 3D mesh layouts significantly improve performance of previous approaches, but they lack of theoretical performance guarantees. We developped a Im1 ${\#119874 (NlogN)}$ algorithm, called FastCOL, to compute a CO layout for unstructured but well shaped meshes. We proved that a coherent traversal of a N -size mesh in dimension d induces less than Im2 ${N/B+\#119874 (N/M^{1/d})}$ cache-misses where B and M are the block size and the cache size, respectively. Experiments show that our layout computation is faster and significantly less memory consuming than the best known CO algorithm (OpenCCL). The FastCOL performance is comparable to the OpenCCL algorithm for classical visualization algorithm access patterns, or better when the BSP tree produced while computing the layout is used as an acceleration data structure adjusted to the layout. We also show that cache oblivious approaches lead to significant performance increases on recent GPU architectures. This algorithm will be published in IEEE TVCG, 2010.


Logo Inria