## Section: Software

`PaStiX`

Participant : Pierre Ramet [ corresponding member ] .

This work is supported by the French “Commissariat à l'Energie Atomique CEA/CESTA” in the context of structural mechanics and electromagnetism applications.

`PaStiX` (http://pastix.gforge.inria.fr )
(Parallel Sparse matriX package) is a scientific library that provides
a high performance parallel solver for very large sparse linear
systems based on block direct and block ILU(k) iterative methods.
Numerical algorithms are implemented in single or double precision
(real or complex): LLt (Cholesky), LDLt (Crout) and LU with static
pivoting (for non symmetric matrices having a symmetric pattern). This
latter version is now used in `FluidBox` (see
Section
5.2 ). The
`PaStiX` library is released under INRIA CeCILL licence.

The `PaStiX` library uses the graph partitioning and sparse matrix
block ordering package `Scotch` (see
Section
5.5 ).
`PaStiX` is based on an efficient static scheduling and memory
manager, in order to solve 3D problems with more than 50 million of
unknowns. The mapping and scheduling algorithm handles a combination
of 1D and 2D block distributions. This algorithm computes an efficient
static scheduling of the block computations for our supernodal
parallel solver which uses a local aggregation of contribution
blocks. This can be done by taking into account very precisely the
computational costs of the BLAS 3 primitives, the communication costs
and the cost of local aggregations. We also improved this static
computation and communication scheduling algorithm to anticipate the
sending of partially aggregated blocks, in order to free memory
dynamically. By doing this, we are able to reduce dramatically the
aggregated memory overhead, while keeping good performance.

Another important point is that our study is suitable for any heterogeneous parallel/distributed architecture when its performance is predictable, such as clusters of SMP nodes. In particular, we now offer a high performance version with a low memory overhead for SMP node architectures, which fully exploits the advantage of shared memory by using an hybrid MPI-thread implementation.

Direct methods are numerically robust methods, but the very large three dimensional problems may lead to systems that would require a huge amount of memory despite any memory optimization. A studied approach consists in defining an adaptive blockwise incomplete factorization that is much more accurate (and numerically more robust) than the scalar incomplete factorizations commonly used to precondition iterative solvers. Such incomplete factorization can take advantage of the latest breakthroughs in sparse direct methods and particularly should be very competitive in CPU time (effective power used from processors and good scalability) while avoiding the memory limitation encountered by direct methods.