Team ScAlApplix

Overall Objectives
Scientific Foundations
Application Domains
New Results
Contracts and Grants with Industry
Other Grants and Activities

Section: Scientific Foundations

Keywords : high-performance computing, parallel sparse linear algebra, fast multipole methods.

Algorithms and high-performance solvers

High-performance direct solvers for distributed clusters

Solving large sparse systems Ax = b of linear equations is a crucial and time-consuming step, arising in many scientific and engineering applications. Consequently, many parallel techniques for sparse matrix factorization have been studied and implemented.

We have started this research by working on the parallelization of an industrial code for structural mechanics, which was a 2D and 3D finite element code and non linear in time. This computational finite element code solves plasticity problems (or thermo-plasticity problems, possibly coupled with large displacements). Since the matrices of these systems are very ill-conditioned, classical iterative methods are not an issue. Therefore, to obtain an industrial software tool that must be robust and versatile, high-performance sparse direct solvers are mandatory, and parallelism is then necessary for reasons of memory capabilities and acceptable solving time. Moreover, in order to solve efficiently 3D problems with more than 10 millions of unkowns, which is now a reachable challenge with new SMP supercomputers, we must achieve a good time scalability and control memory overhead.

In the ScAlApplix project, we focused first on the block partitioning and scheduling problem for high performance sparse LDLT or LLT parallel factorization without dynamic pivoting for large sparse symmetric positive definite systems. Our strategy is suitable for non-symmetric sparse matrices with symmetric pattern, and for general distributed heterogeneous architectures whose computation and communication performances are predictable in advance.

Research about high performance sparse direct solvers is carried on in collaboration with P. Amestoy (ENSEEIHT – IRIT) and J.-Y. L'Excellent (INRIA Rhône-Alpes), and has led to software developments (see section  5.4 5.5 5.8 ) and to industrial contracts with CEA (Commissariat à l'Energie Atomique).

High-performance iterative solvers

In addition to the project activities on direct solvers, we also study some robust preconditioning algorithms for iterative methods. The goal of these studies is to overcome the huge memory consumption inherent to the direct solvers in order to solve 3D dimensional problems of huge size (several millions of unknowns). Our studies focus on the building of generic parallel preconditioners based on ILU factorizations. The classical ILU preconditioners use scalar algorithms that do not exploit well the CPU power and are difficult to parallelize. Our work aims at finding some unknown orderings and partitionings that lead to a dense block structure of the incomplete factors. Then, based on the block pattern, some efficient parallel blockwise algorithms can be devised to build robust preconditioners that are also able to exploit the full capabilities of the modern high-performance computers.

We study two approaches:

Fast Multipole Methods

In most of scientific computing applications considered nowadays as computational challenges like biological systems, astrophysic or electromagnetism, the introduction of hierarchical methods based on an octree structure has dramatically reduced the amount of computation needed to simulate those systems for a given error tolerance.

Among these methods, the Fast Multipole Method (FMM) allows the computation of the interactions in, for example, a molecular dynamics system of N particles in O(N) time, against O(N2) with a direct approach. The extension of these methods and their efficient implementation on current parallel architectures is still a critical issue. Moreover the use of periodic boundary conditions, or of duplications of the system in 2 out of 3 space dimensions, just as well as the use of higher approximations for integral equations are also still relevant.

In order to treat biological systems of up to several millions of atoms, these methods must be integrated in the QC++ platform (see section  5.7 ). They can be used in the three (quantum, molecular and continuum) models for atom-atom interactions in quantum or molecular mechanics, atom-surface interactions for the coupling between continuum and the other models, and also for fast matrix-vector products in the iterative solving of the linear system given by the integral formulation of the continuum method. Moreover, the significant experience achieved by the Scotch and PaStiX projects (see section  5.8 and   5.5 ) will be useful in order to develop efficient implementations of the FMM methods on parallel clusters of SMP nodes.


Logo Inria