## Section: New Results

### Parallel Sparse Direct Solvers and Combinatioral Scientific Computing

Participants : Maurice Brémond, Indranil Chowdhury, Guillaume Joslin, Jean-Yves L'Excellent, Bora Uçar.

#### Some Experiments and Issues to Exploit Multicore Parallelism in a Distributed-Memory Parallel Sparse Direct Solver

Mumps (see Section 5.2 ) is a parallel sparse direct solver, using message passing (MPI) for parallelism. In this work we have experimented how thread parallelism can help taking advantage of recent multicore architectures. The work done consists in testing multithreaded BLAS libraries and inserting OpenMP directives in the routines revealed to be costly by profiling, with the objective to avoid any deep restructuring or rewriting of the code. In INRIA report RR-7411 (October 2010), we have reported on various aspects of this work, presented some of the benefits and difficulties, and showed that 4 to 8 threads per MPI process is generally a good compromise for performance, while increasing the number of threads is always interesting in terms of memory usage. We also considered and discussed several issues that appear to be critical with a mixed MPI-OpenMP approach in a multicore environment. In the future we plan to pursue this work on larger numbers of cores.

#### Design, Implementation, and Analysis of Maximum Transversal Algorithms

We have investigated seven maximum traversal algorithms. We report on their careful implementations. The algorithms are analyzed and design choices are discussed. To the best of our knowledge, this is the most comprehensive comparison of maximum transversal algorithms based on augmenting paths. Previous papers with the same objective either do not have all the algorithms discussed in this paper or they use non-uniform implementations from different researchers. We use a common base to implement all of the algorithms and compare their relative performance on a wide range of graphs and matrices. We systematize, develop and use several ideas for enhancing performance. One of these ideas improves the performance of one of the existing algorithms in most cases, sometimes significantly. So much so that we use this as the eighth algorithm in comparisons.

#### On computing inverse entries of a sparse matrix in an out-of-core environment

The inverse of an irreducible sparse matrix is structurally full, so that it is impractical to think of computing or storing it. However, there are several applications where a subset of the entries of the inverse is required. Given a factorization of the sparse matrix held in out-of-core storage, we show how to compute such a subset efficiently, by accessing only parts of the factors. When there are many inverse entries to compute, we need to guarantee that the overall computation scheme has reasonable memory requirements, while minimizing the cost of loading the factors. This leads to a partitioning problem that we prove is NP-complete. We also show that we cannot get a close approximation to the optimal solution in polynomial time. We thus need to develop heuristic algorithms, and we propose: (i) a lower bound on the cost of an optimum solution; (ii) an exact algorithm for a particular case; (iii) two other heuristics for a more general case; and (iv) hypergraph partitioning models for the most general setting. We illustrate the performance of our algorithms in practice using the Mumps software package on a set of real-life problems as well as some standard test matrices. We show that our techniques can improve the execution time by a factor of 50.

#### The minimum degree ordering with dynamical constraints

We propose a modification of the minimum degree ordering algorithm in which some variables are constrained to be ordered only after some other nodes are ordered. The constrained variables are initially specified, and their constraints are removed during the course of the algorithm. This is close to the minimum degree ordering with constraints algorithm. The difference is that during the course of our algorithm we remove some of the constraints, whereas the constraints are static in the current constrained ordering algorithms. Such an algorithm can have different applications; we target the ordering problem for saddle point matrices.

#### On finding dense submatrices of a sparse matrix

We consider a family of problems exemplified with the following one:
Given an m×n matrix A and an integer kmin{m, n} , find a set of row indices and a set of column indices such that the number of nonzeros in the submatrix indexed by and , i.e., in Matlab
notation, is maximized.
This is equivalent to finding a k×k submatrix S of A with entries S_{ij} = A_{ri, cj} such that it contains the maximum number of nonzeros
among all k×k submatrices of A .
We show that this problem is NP-complete, and then propose and analyze heuristic approaches to the problem.
The problems of this nature arises in a family of hybrid solvers for sparse linear systems.