BACCHUS is a joint team of INRIA Bordeaux - Sud-Ouest,
LaBRI (Laboratoire Bordelais de Recherche en Informatique –
CNRS UMR 5800, University of Bordeaux and IPB) and IMB
(Institut Mathématique de Bordeaux – CNRS UMR 5251,
University of Bordeaux). BACCHUS has been created on the
first of January, 2009 (
http://

The purpose of the
`BACCHUS`project is to analyze and solve efficiently
scientific computation problems that arise in complex
research and industrial applications and that involve
scaling. By scaling we mean that the applications considered
require an enormous computational power, of the order of tens
or hundreds of teraflops, and that they handle huge amounts
of data. Solving these kinds of problems requires a
multidisciplinary approach involving both applied mathematics
and computer science.

Our major focus are fluid problems, and especially the
simulation of
*physical wave propagation problems*including fluid
mechanics, inert and reactive flows, multimaterial and
multiphase flows, acoustics, etc.
`BACCHUS`intends to solve these problems by bringing
contributions to all steps of the development chain that goes
from the design of new high-performance, more robust and more
precise numerical schemes, to the creation and implementation
of optimized parallel algorithms and high-performance
codes.

By taking into account architectural and performance concerns from the early stages of design and implementation, the high-performance software which will implement our numerical schemes will be able to run efficiently on most of today's major parallel computing platforms (UMA and NUMA machines, large networks of SMP nodes, production GRIDs).

The
`Scotch`software, which is now fully 64-bit since
its revision
`5.1.10`, has been able to partition graphs above
2 billion vertices on 2048 processors. Runs have been
performed on up to 30.000 processing elements.

`RealfluiDS`has been upgraded to hybrid meshes in
3D. We have been able to run an M6 wing mesh given by
ONERA with
5.510
^{6}vertices with Q1 elements (256 procs)
and a supersonic business jet configuration with the P2
third order version (128 procs,
8 10
^{5}degrees of freedom).

C. Dobrzynski, M. Ricchiuto and R.
Abgrall, jointly with the MC2 team project, have
participated in the organisation the CANUM 2010, june
2010, Carcans Maubuisson. see
http://

A large number of industrial problems involve fluid
mechanics. They may involve the coupling of one or more
physical models. An example is provided by aeroelastic
problems, which have been studied in details by other INRIA
teams. Another example is given by flows in pipelines where
the fluid (a mixture of air–water–gas) does not have
well-known physical properties. One may also consider
problems in aeroacoustics, which become more and more
important in everyday life. In some occasions, one needs
specific numerical tools to take into account
*e.g.*a fluids' exotic equation of state, or because the
amount of required computational resources becomes huge, as
in unsteady flows. Another situation where specific tools are
needed is when one is interested in very specific physical
quantities, such as
*e.g.*the lift and drag of an airfoil, a situation where
commercial tools can only provide a very crude answer.

It is a fact that there are many commercial codes. They
allow users to simulate a lot of different flow types. The
quality of the results is however far from optimal in many
cases. Moreover, the numerical technology implemented in
these codes is often not the most recent. To give a few
examples, consider the noise generated by wake vortices in
supersonic flows (external aerodynamics/aeroacoustics), or
the direct simulation of a 3D compressible mixing layer in a
complex geometry (as in combustion chambers). Up to our
knowledge, due to the very different temporal and physical
scales need to be captured, a direct simulation of these
phenomena is not in the reach of the most recent technologies
because the numerical resources required are currently
unavailable !
*We need to invent specific algorithms for this
purpose.*

In order to efficiently simulate these complex physical
problems, we are working on some fundamental aspects of the
numerical analysis of non linear hyperbolic problems.
*Our goal is to develop schemes that can adapt to modern
computer architectures*.

More precisely,
*we are working on a class of numerical schemes*, known
in literature as Residual Distribution schemes,
*specifically tailored to unstructured and hybrid
meshes*. They have the most possible compact stencil that
is compatible with the expected order of accuracy. This
*accuracy is at least of second order, and it can go up to
fourth order in practical applications.*Since the stencil
is compact, the implementation on parallel machines becomes
simple. These schemes are very flexible in nature, which is
so far one of the most importat advantage over other
techniques. This feature has allowed us to adapt the schemes
to the requirements of different physical situations (
*e.g.*different formulations allow either en efficient
explicit time advancement for problems involving small
time-scales, or a fully implicit space-time variant which is
unconditionally stable and allows to handle stiff problems
where only the large time scales are relevant). This
flexibility has also enabled to devise a variant using the
same data structure of the popular Discontinuous Galerkin
schemes, which are also part of our scientific focus.

The compactness of the second order version of the schemes enables us to use efficiently the high performance parallel linear algebra tools developed by the team. However, the high order versions of these schemes, which are under development, require modifications to these tools taking into account the nature of the data structure used to reach higher orders of accuracy. This leads to new scientific problems at the border between numerical analysis and computer science. In parallel to these fundamental aspects, we also work on adapting more classical numerical tools to complex physical problems such as those encountered in interface flows, turbulent or multiphase flows, material science.

Within a few years, we expect to be able to deal with physical problems out of today's reach, such as aeroacoustics, unsteady aerodynamics, and compressible MHD (in relation with the ITER project). This will be achieved by means of a multi-disciplinary effort involving our research on compact distribution schemes, the parallel advances in algebraic solvers and partitioners, and the strong interactions with specialists in computer science and scientific computing.

Another topic of interest is the quantification of uncertainties in non linear problems. In many applications, the physical model is not known accurately. A typical example is the one of turbulent flows where, for a given turbulent model which depends on many coefficients, the coefficients themselves are not know accurately. A similar situation occur for real gas or multiphase flows where the equation of state form suffer from uncertainties. The dependency of the model with respect to these uncertainties can be studied by propagation of chaos techniques such as those developped during the recent years via polynomial chaos techniques. Different implementations exists, depending whether the method is intrusive or not. The accuracy of these methods is still a matter of research, as well how they can handle an as large as possible number of uncertainties or their versatility with respect to the structure of the random variable pdfs.

Our research in numerical algorithms has led to the
development of the
`RealfluiDS`platform which is described in
section
. This work is supported by the
EU-Strep IDIHOM, various research contracts and in part by
the ANR-CIS ASTER project (see section
also), and also by the ERC grant
ADDECCO.

Solving large sparse systems
Ax=
bof linear equations is a crucial
and time-consuming step, arising in many scientific and
engineering applications. Consequently, many parallel
techniques for sparse matrix factorization have been
studied and implemented.

Sparse direct solvers are mandatory when the linear system is very ill-conditioned; such a situation is often encountered in structural mechanics codes, for example. Therefore, to obtain an industrial software tool that must be robust and versatile, high-performance sparse direct solvers are mandatory, and parallelism is then necessary for reasons of memory capability and acceptable solving time. Moreover, in order to solve efficiently 3D problems with more than 50 million unknowns, which is now a reachable challenge with new SMP supercomputers, we must achieve good scalability in time and control memory overhead. Solving a sparse linear system by a direct method is generally a highly irregular problem that induces some challenging algorithmic problems and requires a sophisticated implementation scheme in order to fully exploit the capabilities of modern supercomputers.

In the
`BACCHUS`project, we focused first on the block
partitioning and scheduling problem for high performance
sparse
LDL^{T}or
LL^{T}parallel factorization without dynamic pivoting for
large sparse symmetric positive definite systems. Our
strategy is suitable for non-symmetric sparse matrices with
symmetric pattern, and for general distributed
heterogeneous architectures the computation and
communication performance of which are predictable in
advance. This has led to software developments (see
sections
,
)

In addition to the project activities on direct solvers, we also study some robust preconditioning algorithms for iterative methods. The goal of these studies is to overcome the huge memory consumption inherent to the direct solvers in order to solve 3D problems of huge size (several million of unknowns). Our studies focus on the building of generic parallel preconditioners based on ILU factorizations. The classical ILU preconditioners use scalar algorithms that do not exploit well CPU power and are difficult to parallelize. Our work aims at finding some unknown orderings and partitioning that lead to a dense block structure of the incomplete factors. Then, based on the block pattern, some efficient parallel blockwise algorithms can be devised to build robust preconditioners that are also able to fully exploit the capabilities of modern high-performance computers.

In this context, we study two approaches.

The first idea is to define an adaptive blockwise incomplete factorization that is much more accurate (and numerically more robust) than the scalar incomplete factorizations commonly used to precondition iterative solvers. Such incomplete factorization can take advantage of the latest breakthroughs in sparse direct methods and particularly should be very competitive in CPU time (effective power used from processors and good scalability) while avoiding the memory limitation encountered by direct methods. By this way, we expect to be able to solve systems in the order of hundred million of unknowns and even one billion of unknowns. Another goal is to analyze and justify the chosen parameters that can be used to define the block sparse pattern in our incomplete factorization.

The driving rationale for this study is that it is easier to incorporate incomplete factorization methods into direct solution software than it is to develop new incomplete factorizations.

Our main goal at this point is to achieve a significant diminution of the memory needed to store the incomplete factors (with respect to the complete factors) while keeping enough fill-in to make the use of BLAS3 (in the factorization) and BLAS2 (in the triangular solves) primitives profitable.

In this approach, we focus on the critical problem
to find approximate supernodes of ILU(k)
factorizations. The problem is to find a coarser block
structure of the incomplete factors. The “exact”
supernodes that are exhibited from the incomplete
factor non zero pattern are usually very small and thus
the resulting dense blocks are not large enough for an
efficient use of the BLAS3 routines. A remedy to this
problem is to merge supernodes that have nearly the
same structure. The benefits of this approach have been
shown in
. These algorithms are
implemented in the
`PaStiX`library.

The second technique makes use of a Schur complement approach.

In recent years, a few Incomplete LU factorization techniques were developed with the goal of combining some of the features of standard ILU preconditioners with the good scalability features of multilevel methods. The key feature of these techniques is to reorder the system in order to extract parallelism in a natural way. Often a number of ideas from domain decomposition are utilized and mixed to derive parallel factorizations.

Under this framework, we developed in collaboration with Yousef Saad (University of Minnesota) algorithms that generalize the notion of “faces” and “edge” of the “wire-basket” decomposition. The interface decomposition algorithm is based on defining a “hierarchical interface structure” (HID). This decomposition consists in partitioning the set of unknowns of the interface into components called connectors that are grouped in “classes” of independent connectors .

In the context of robust preconditioner technique,
we have developed an approach that uses the HID
ordering to define a new hybrid direct-iterative
solver. The principle is to build a decomposition of
the adjacency matrix of the system into a set of small
sub-domains (the typical size of a sub-domain is around
a few hundreds or thousand nodes) with overlap. We
build this decomposition from the nested dissection
separator tree obtained using a sparse matrix
reordering software as
`Scotch`. Thus, at a certain level of the
separator tree, the sub-trees are considered as the
interior of the sub-domains and the union of the
separators in the upper part of the elimination tree
constitutes the interface between the sub-domains.

The interior of these sub-domains are treated by a direct method. Solving the whole system is then equivalent to solve the Schur complement system on the interface between the sub-domains which has a much smaller dimension. We use the hierarchical interface decomposition (HID) to reorder and partition this system. Indeed, the HID gives a natural dense block structure of the Schur complement. Based on this partition, we define some efficient block preconditioners that allow the use of BLAS routines and a high degree of parallelism thanks to the HID properties.

We propose several algorithmic variants to solve the
Schur complement system that can be adapted to the
geometry of the problem: typically some strategies are
more suitable for systems coming from a 2D problem
discretisation and others for a 3D problem; the choice
of the method also depends on the numerical difficulty
of the problem. In the
`HIPS`library, we provide full iterative methods
(very low memory consumption) as well as hybrid methods
that mixes a direct factorization inside the domain and
an iterative method in the Schur complement. The
library provides many options that allow one to deal
with real or complex arithmetic, and symmetric or
unsymmetric matrices. In particular, the very
interesting feature of
`HIPS`is that it allows one to find some good
trade-off between memory, robustness and time
consumption in almost every case.

These works are also supported by the ANR-CIS project “SOLSTICE”.

Finding vertex separators for sparse matrix ordering is only one of the many uses of generic graph partitioning tools. For instance, finding balanced and compact domains in problem graphs is essential to the efficiency of parallel iterative solvers. Here again, because of the size of the problems at stake, parallel graph partitioning tools are mandatory to provide good load balance and minimal communication cost.

The execution of parallel applications implies communication between processes executed on the different cores. On NUMA architectures which are strongly heterogeneous in terms of latency and capacity, communication cost strongly depends on the repartition of tasks among cores. Architecture-aware load balancing must take into account both the characteristics of the parallel applications (including for instance task processing costs and the amount of communication between tasks) and the topology of the target architecture (providing the powers of cores and the costs of communication between all of them). When processes are assumed to coexist simultaneously for all the duration of the program, this optimization problem is called mapping. A mapping is called static if it is computed prior to the execution of the program and is never modified at run-time.

The sequential
`Scotch`tool was able to perform static mapping
since its first version, but this feature was not widely
known nor used by the community. With the increasing need
to map very large problem graphs onto very large and
strongly heterogeneous parallel machines (whether
hierarchical NUMA clusters or GPU-based systems), there
is an increasing demand for parallel static mapping
tools.

Many simulations which model the evolution of a given phenomenon along with time (turbulence and unsteady flows, for instance) need to re-mesh some portions of the problem graph in order to capture more accurately the properties of the phenomenon in areas of interest. This re-meshing is performed according to criteria which are closely linked to the undergoing computation and can involve large mesh modifications: while elements are created in critical areas, some may be merged in areas where the phenomenon is no longer critical.

Performing such re-meshing in parallel creates additional problems. In particular, splitting an element which is located on the frontier between several processors is not an easy task, because deciding when splitting some element, and defining the direction along which to split it so as to preserve numerical stability most, require shared knowledge which is not available in distributed memory architectures. Ad-hoc data structures and algorithms have to be devised so as to achieve these goals without resorting to extra communication and synchronization which would impact the running speed of the simulation.

Most of the works on parallel mesh adaptation attempt to parallelize in some way all the mesh operations: edge swap, edge split, point insertion, etc. It implies deep modifications in the (re)mesher and often leads to bad performance in term of CPU time. An other work proposes to base the parallel re-meshing on existing mesher and load balancing to be able to modify the elements located on the frontier between several processors.

In addition, the preservation of load balance in the re-meshed simulation requires dynamic redistribution of mesh data across processing elements. Several dynamic repartitioning methods have been proposed in the literature , , which rely on diffusion-like algorithms and the solving of flow problems to minimize the amount of data to be exchanged between processors. However, integrating such algorithms into a global framework for handling adaptive meshes in parallel has yet to be done.

The main objective of the
`BACCHUS`project is to analyze and solve scientific
computing problems coming from complex research and
industrial applications that require a scalable approach.
This allows us to validate the numerical schemes, the
algorithms and the associated software that we develop. We
have today three reference application domains which are
fluid mechanics, material physics and the MHD simulation
dedicated to the ITER project.

In these three domains, we study and simulate phenomena that are by nature multiscale and multiphysics, and which require enormous computing power. A major part of these works leads to industrial collaborations in particular with the CNES, ONERA, and with the french CEA/CESTA, CEA/Ile-de-France and CEA/Cadarache centers.

.

The numerical simulation of steady and unsteady flows is
still a challenge since efficient schemes and efficient
implementations are needed. The accuracy of schemes is still
a problem nowadays. This challenge is even higher if large
size problems are considered, and if the meshes are not
regular. The schemes developed in for fluid mechanics
problems use
`Scotch`,
`HIPS`and
`PaStiX`when the type of problems and the CPU
requirements make this useful.

One of our application fields is the one of steady subsonic, transonic and supersonic flow problems when the equation of state is for example the one of air in standard conditions, or a more general one as in real gases and multiphase flows. This class of physical problems corresponds to “standard” aerodynamics and the models are those of the Euler equations and the Navier Stokes ones, possibly with turbulent effects. Here we consider the residual distribution and SUPG schemes.

Another field of application is the one of
*unsteady*problems for the same physical models.
Depending on the applications, the physical models
considered involve the Navier-Stokes equations, or the
non-linear or linearized linearized Euler equations. The
schemes we develop are the Residual distribution schemes
and Discontinuous Galerkin
schemes
. Specific modifications, with
respect to their steady counter parts, are done in order to
reduce dramatically the computational time, while
maintaining the desired accuracy.

.

Numerical simulation has become a major tool for the study of many physical phenomena involving charged particles, in particular beam physics, space and laboratory plasmas including fusion plasmas. Moreover, it is a subject of interest to figure out and optimize physics experiments in the present fusion devices and also to design future reactors like in the ITER project. Parallelism is required to carry on numerical simulations for realistic test cases.

We have established a collaboration with the physicists of
the CEA/DRFC group in the context of the ANR CIS 2006 project
called ASTER (Adaptive MHD Simulation of Tokamak Elms for
iteR). The magneto-hydrodynamic instability called ELM for
Edge Localized Mode is commonly observed in the standard
tokamak operating scenario. The energy losses the ELM will
induce in ITER plasmas are a real concern. However, the
current understanding of what sets the size of these ELM
induced energy losses is extremely limited. No numerical
simulations of the complete ELM instability, from its onset
through its non-linear phase and its decay, are referenced in
literature. Recently, encouraging results on the simulation
of an ELM cycle have been obtained with the
`JOREK`code developed at CEA but at reduced toroidal
resolution. The
`JOREK`code uses a fully implicit time evolution
scheme in conjunction with the
`PaStiX`sparse matrix library.

We develop two kinds of software. The first one consists
in generic libraries that are used within application codes.
These libraries comprise a sequential and parallel
partitioner for large irregular graphs or meshes (
`Scotch`), and high performance direct or hybrid
solvers for very large sparse systems of equations (
`PaStiX`and
`HIPS`). The second kind of software corresponds to
dedicated software for fluid mechanics (
`RealfluiDS`).

For these parallel software developments, we use the message passing paradigm (basing on the MPI interface), sometimes combined with threads so as to exploit multi-core architectures at their best: in some computation kernels such as solvers, when processing elements reside on the same compute node, message buffer space can be saved because the aggregation of partial results can be performed directly in the memory of the receiving processing element. Memory savings can be tremendous, and help us achieve problem sizes which could not be reached before (see Section ).

`RealfluiDS`is a software dedicated to the simulation
of inert or reactive flows. It is also able to simulate
multiphase, multimaterial and MHD flows. There exist 2D and
3D dimensional versions. The 2D version is used to test new
ideas that are later implemented in the 3D one. Two classes
of schemes have been implemented: classical finite volume
schemes and the more recent residual distribution schemes.
Several low Mach preconditioning techniques are also
implemented. The code has been parallelized with and without
overlap of the domains. Recently, the
`PaStiX`solver has been integrated in
`RealfluiDS`. A partitioning tool exists in the
package, which uses
`Scotch`.

This work is supported by the French “Commissariat à l'Énergie Atomique CEA/CESTA” in the context of structural mechanics and electromagnetism applications.

`PaStiX`(
http://
`RealfluiDS`(see Section
). The
`PaStiX`library is released under INRIA CeCILL
licence.

The
`PaStiX`library uses the graph partitioning and sparse
matrix block ordering package
`Scotch`(see Section
).
`PaStiX`is based on an efficient static scheduling and
memory manager, in order to solve 3D problems with more than
50 million of unknowns. The mapping and scheduling algorithm
handles a combination of 1D and 2D block distributions. This
algorithm computes an efficient static scheduling of the
block computations for our supernodal parallel solver which
uses a local aggregation of contribution blocks. This can be
done by taking into account very precisely the computational
costs of the BLAS 3 primitives, the communication costs
and the cost of local aggregations. We also improved this
static computation and communication scheduling algorithm to
anticipate the sending of partially aggregated blocks, in
order to free memory dynamically. By doing this, we are able
to reduce the aggregated memory overhead, while keeping good
performance.

Another important point is that our study is suitable for any heterogeneous parallel/distributed architecture when its performance is predictable, such as clusters of SMP nodes. In particular, we now offer a high performance version with a low memory overhead for SMP node architectures, which fully exploits the advantage of shared memory by using an hybrid MPI-thread implementation.

Direct methods are numerically robust methods, but the very large three dimensional problems may lead to systems that would require a huge amount of memory despite any memory optimization. A studied approach consists in defining an adaptive blockwise incomplete factorization that is much more accurate (and numerically more robust) than the scalar incomplete factorizations commonly used to precondition iterative solvers. Such incomplete factorization can take advantage of the latest breakthroughs in sparse direct methods and particularly should be very competitive in CPU time (effective power used from processors and good scalability) while avoiding the memory limitation encountered by direct methods.

`HIPS`(Hierarchical Iterative Parallel Solver) is a
scientific library that provides an efficient parallel
iterative solver for very large sparse linear systems.

The key point of the methods implemented in
`HIPS`is to define an ordering and a partition of the
unknowns that relies on a form of nested dissection ordering
in which cross points in the separators play a special role
(Hierarchical Interface Decomposition ordering). The
subgraphs obtained by nested dissection correspond to the
unknowns that are eliminated using a direct method and the
Schur complement system on the remaining of the unknowns
(that correspond to the interface between the sub-graphs
viewed as sub-domains) is solved using an iterative method
(GMRES or Conjugate Gradient at the time being). This special
ordering and partitioning allows for the use of dense block
algorithms both in the direct and iterative part of the
solver and provides a high degree of parallelism to these
algorithms. The code provides a hybrid method which blends
direct and iterative solvers.
`HIPS`exploits the partitioning and multistage ILU
techniques to enable a highly parallel scheme where several
subdomains can be assigned to the same process. It also
provides a scalar preconditioner based on the multistage ILUT
factorization.

`HIPS` can be used as a standalone program that
reads a sparse linear system from a file ; it also provides
an interface to be called from any C, C++ or Fortran code. It
handles symmetric, unsymmetric, real or complex matrices.
Thus,
`HIPS`is a software library that provides several
methods to build an efficient preconditioner in almost all
situations.

Since august 2008,
`HIPS`is publicly available at
http://

`Scotch`(
http://

The initial purpose of
`Scotch`was to compute high-quality partitions and
static mappings of valuated graphs representing parallel
computations and target architectures of arbitrary
topologies. The original contribution consisted in developing
a “
*divide and conquer*” algorithm in which processes are
recursively mapped onto processors by using graph bisection
algorithms that are applied both to the process graph and to
the architecture graph. This allows the mapper to take into
account the topology and heterogeneity of the valuated graph
which models the interconnection network and its resources
(processor speed, link bandwidth). As new multicore,
multinode parallel machines tend to be less uniform in terms
of memory latency and communication bandwidth, this feature
is regaining interest.

The software has then been extended in order to produce vertex separators instead of edge separators, using a multilevel framework. Recursive vertex separation is used to compute orderings of the unknowns of large sparse linear systems, which both preserve sparsity when factorizing the matrix and exhibit concurrency for computing and solving the factored matrix in parallel.

Version
`5.0`of
`Scotch`, released on August 2007, was the first
version to comprise parallel routines. This extension, called
`PT-Scotch`(for “
*Parallel Threaded*
`Scotch`
*”), is based on a distributed memory model, and makes
use of the MPI and, optionally, Posix thread APIs. A
distributed graph structure has been defined, which allows
users to reserve vertex indices on each processor for future
local adaptive refinement. Its parallel graph ordering
routine provides orderings which are of the same quality as
the ones yielded by the sequential
Scotchordering routine, while competing software
ParMETISexperiences a severe loss of quality when
the number of processors increase.
Scotch
5.0was released under the CeCILL-C free/libre
software license, and has been registered at APP (“Agence
pour la Protection des Programmes”).*

Version
`5.1`of
`Scotch`, released on September 2008, extended
the parallel features of
`PT-Scotch`, which can now compute graph partitions in
parallel by means of a parallel recursive bipartitioning
framework. Release
`5.1.10`had made
`Scotch`the first full 64-bit implementation of a
general purpose graph partitioner, so that
`PT-Scotch`has been able to successfully break the
“32-bit” barrier and partition a graph above 2 billion
vertices, spread across 2048 processors, at the French CCRT
computer center.

`Scotch`has been integrated in numerous third-party
software, which indirectly contribute to its diffusion. For
instance, it is used by the
Zoltanmodule of
the
Trilinossoftware
(SANDIA Labs), by
Code_Aster Libre,
a GPLed thermal and mechanical analysis software developed by
French state-owned electricity producer EDF, by the parallel
solvers
`MUMPS`(ENSEEITH/IRIT, LIP and LaBRI),
`SuperLUDist`(U.C. Berkeley),
`PaStiX`(LaBRI) and
`HIPS`(LaBRI), as well as by several other scientific
computing software.

`MMG3D`is a tetrahedral fully automatic remesher.
Starting with a tetrahedral mesh, it produces quasi-uniform
meshes with respect to a metric tensor field. This tensor
prescribes a length and a direction for the edges, thus the
resulting meshes will be anisotropic meshes. The software is
based on local mesh modifications and an anisotropic version
of Delaunay kernel is implemented to insert vertex on the
mesh. Moreover,
`MMG3D`allows to deal with rigid body motion and
moving meshes. When a displacement is prescribed on a part of
the boundary, a final mesh is generated such as the surface
points will be moved according this displacement.
`MMG3D`is used in particular in GAMMA for their mesh
adaptation developments, but also at EPFL (maths department),
Dassault Aviation, Lemma (a french SME), etc.
`MMG3D`can be used in
`FreeFem++`(
http://

`Montjoie`is a finite element code initially handling
only quadrilateral/hexahedral elements. Because of the
tensorization of these elements, efficient algorithms can be
written for the computation of finite element matrices. It
can handle tetrahedra, prisms, pyramids, hexahedra with
continuous finite element, edge elements and discontinuous
Galerkin formulations. A local order of approximation can be
used in each element of the mesh.

The developement of PLATO (A platform for Tokamak
simulation) (
http://

A (small) database corresponding to axi-symmetrical solutions of the equilibrium plasma equations for realistic geometrical and magnetic configurations (ToreSupra, JET and ITER). The construction of meshes is always an important time consuming task. Plato will provide meshes and solutions corresponding to equilibrium solutions that will be used as initial data for more complex computations.

A set of tool for the handling, manipulation and transformation of meshes and solutions using different discretisations (P1, Q1, P3, etc)

Numerical templates allowing the use of 3D discretization schemes using finite element schemes in the poloidal plane and spectral Fourier or structured finite volume representations in the toroidal one.

Several applications (Ideal MHD and drift approximation) used in the framework of the Inria large scale initiative "FUSION”.

This year, after a definition of the PLATO architecture, the points 1. and 2. have been developped.

`PaMPA`(Parallel Mesh Partitioning and Adaptation) is
a middleware library dedicated to the management of
distributed meshes. Its purpose is to relieve solver writers
from the tedious and error prone task of writing again and
again service routines for mesh handling, data communication
and exchange, remeshing, and data redistribution. An API of
the future platform has been devised, and the coding of the
mesh handling and redistribution routines is in progress.
`PaMPA`will be used as a base module of the PLATO
solver, to balance dynamically, refine and coarsen its
distributed mesh.

This year, many developments have been conducted and
implemented in the
`RealfluiDS`software after
which has opened up many
doors.

A three dimensional and parallel version of the second
order RD scheme on hybrid meshes (tetrahedral-hexahedral)
for the steady Euler equation is now working. It has been
validated on several M6 wing meshes given by ONERA. The
largest mesh has
5.5 10
^{6}vertices and the simulation has been
done on 256 processors. Meanwhile the third order version
of the same scheme (for tetrahedrons) was run succesfully
on a busines jet configuration in supersonic conditions.
The mesh was given by GAMMA3. This last version is also
able to run the Navier Stokes equation, but the
approximation is not fully satisfactory.

For this reason, we have put a lot of effort in
understanding the correct way of approximation
advection-diffusion like problems that degenerate to
standard RD scheme when the viscous term disapear. The main
difficulty was to have a scheme that work well in the
largest possible range of Peclet numbers. This goal has
been achieved, ane we have two versions of the method (one
using a gradient reconstruction and one using a relaxation
interpretation of the steady advection diffusion.) These
methods will be implemented in
`RealfluiDS`in the coming months for IDIHOM.

Mario Ricchiuto has conducted work on more efficient RD discretizations for time dependent problems. This has led to two formulations. The first is a genuinely explicit variant of the method based on high order mass lumping and on a temporally shifted stabilization (upwinding) operator , . This formulation is well suited for problems where the time scales of interest are small. The higher order variant of the methodology, currently limited to second order, is under development. In parallel, in collaboration with M. Hubbard of the university of Leeds, an unconditionally stable space-time formulation has been proposed , . This variant allows arbitrarily large time steps to be taken while preserving the accuracy and monotonicity of the results. Further work is under way to extend the accuracy to more than second order. The PhD of Guillaume Baurin has started to implement the third order version of our methods in a real industrial platform (SAFRAN). He has started the implementation of the RD scheme for the Navier Stokes equation in that platform.

Results on curved meshes and non-Lagrangian elements (Bézier and NURBS) have been obtained by Algiane Froehly. The method is now third and fourth order accurate in 2D. Cécile Dobrzynski has worked on the construction of “high order” meshes using Bézier and Nurbs elements. Numerical results using these meshes and A. Froehly's development has been obtained in 2D for subsonic, transonic and hypersonic problems. Extension in 3D is underway, one of the main difficulty is to generate meshes.

C. Waervaecke's PhD thesis has been a collaborative work between BACCHUS, MC2 (Héloise Beaugendre) and PUMAS (Boniface Nkonga). The main weakness of the classical finite element method (Galerkin) is its lack of stability for advection dominated flows. We consider in this work a compressible Navier-Stokes equations combined with the one equation Spalart-Allmaras turbulence model. These equations are solved in a coupled way. The numerical stability is achieved thanks to the Streamline Upwind Petrov-Galerkin (SUPG) formulation. Within the framework of SUPG method, artificial viscosity is anisotropic and the principal component is aligned with streamlines. The aim is to put sufficient viscosity to get rid of instability and unphysical oscillations without damaging the accuracy of the method. The amount of artificial viscosity is controlled by a stabilization tensor. Since an optimal way to choose this tensor is still unknown, several ways have been investigated. Beside SUPG method is also used in combination with a shock-parameter term which supplied additional stability near shock fronts. Numerical results show that the method is able to reproduce good turbulent profiles with less numerical diffusion than a finite volume method. Even in the case of almost incompressible flow, the numerical strategy is robust and gives good results.

S. Galéra, P. Congedo and R. Abgrall have made a detailed comparison between the semi-intrusive method developed last year with more classical non intrusive polynomial chaos methods, and Monte Carlo results. These results have been presented in part during the ECCOMAS CFD conference in June 2010. We have also adapted the SUPG method for turbulent flows to this method, so that we are able to produce turbulent simulations including one and two uncertainties (here on the inflow mach number and the velocity angle).

During E. Mbinki's internship, we have tried to understand the algorithms behind the Smoliak algorithm and how they can be adapted to the semi intrusive method using local conditional expectancy.

G. Geraci has started his PhD and one of the goals is to handle as many as possible uncertainties for fluid problems.

Explicit schemes may become very expensive because of a restrictive stability condition (small CFL), especially when the computational mesh comprises some very small elements. A solution, known as local time-stepping, consists of considering different time steps for each element of the mesh. These kinds of solutions can be applied to continuous finite element but are more natural when applied to Discontinuous Galerkin methods. Marc Duruflé with S. Imperiale (PhD at project POEMS) have proposed a new local time-stepping strategy and validated the approach for wave problems.

Rémi Abgrall and Pierre-Henri Maire, with François Vilar (PhD at CELIA funded by a CEA grant started in October 2009), have started to work on Lagrangian schemes within the Discontinuous Galerkin schemes. The idea is to start from the formulation of the Euler equation in full Lagrange coordinates: the spatial derivative are written in Lagrangian coordinates. This has led to a publication in Computers and Fluids where our results in 1D are described. Currently, we are developing the method in 2 dimensions. The main difficulty is to understand the role and the structure of the Geometric Conservation law.

C. Dobrzynski has developed an efficient tool for
handling moving 2D and 3D meshes. Here, contrarily to most
ALE methods, the connectivity of the mesh is changing in
time as the objects within the computational domain are
moving. The objective is to guaranty a high quality mesh in
terms of minimum angle for example. Other criteria, which
depend on the physical problem under consideration, can
also been handled. Currently this meshing tool is being
coupled with
`RealfluiDS`in order to produce CFD applications.
One target example is the simulation of the 3D flow over
helicopter blades.

A work on high order mesh generation has begun. We are
modifying the classic mesh operators to take into account
the curve edges. Beginning with a derefined valid curve
mesh, we would to be able to generate an uniform refined
curve mesh and also to adapt the mesh density in certain
region (boundary layer). Moreover, starting with a
P^{1}(triangle) mesh and some information on the
boundary, we are able to generate a valid three order
curved mesh. The algorithm is based on edge swaps and is
similar to a boundary enforcement procedures.

We have been involved in two tasks: in the first one we work on novel numerical schemes for solving the compressible resistive MHD equations . In the second one, in connecting with the JOREK code developped in CEA Cadarache, we work on adaptive mesh refinement problems and their connection with the solution of large linear systems to be solved in parallel.

The aim of our work in the ASTER project is to provide an
efficient numerical method for solving the MHD equations,
more especially in the form they are used for the ITER model.
Here we want to improve the ability of Residual Distribution
schemes to solve this hyperbolic system. Once fixed the
choice of the full set of equations and the behavior of
physical parameters, and with a validated numerical solver
for these equations, we should be able to simulate plasma
instabilities like those encountered in the ITER tokamak
configuration. The step to ELMs simulations would then be
achieved. This is a global view of the context, and it may be
seen as a framework. However, one should notice that,
contrarily to the work on JOREK at the CEA Cadarache, our
purpose is a more general and academic code (
`RealfluiDS`).

Due to the localized nature of the ELMs at the boundary of the plasma the use of mesh refinement is ideally suited to minimize the number of elements required for a given accuracy. The high resolution is only required where large gradients develop which is on a surface which is deforming in time. At a later stage during the ELM evolution, blobs of plasma are disconnected from the main plasma for which a mesh refinement also appears to be an optimal solution. We first adapt the mesh at the beginning of the simulation during the initialization phase, in order to guarantee the equilibrium, then we will apply the modifications on the mesh in order to get the adaptative mesh refinement procedure during the whole simulation.

The
`JOREK`code is now able to use several hundred of
processors routinely. Simulations of ELMs are produced taking
into account the X-point geometry with both closed and open
field lines. But a higher toroidal resolution is required for
the resolution of the fine scale filaments that form during
the ELM instability. The complexity of the tokamak's geometry
and the fine mesh that is required leads to prohibitive
memory requirements. In the current release, the memory
scaling is not satisfactory: as one increases the number of
processes for a given problem size, the memory footprint on
each process does not reduce as much as one can expect. This
will be one of the goals that motivate the partners involved
in this project to present a new collaboration proposal.

Like for the year before, the work carried out within
the
`Scotch`project (see section
) focused on four main
axes.

The first one regards the parallelization of the static
mapping routines already available in the sequential
version of
`Scotch`. Since its version
5.1,
`Scotch`provides parallel graph partitioning
capabilities, but graph partitions are computed to date by
means of a parallel multilevel recursive bisection
framework. This framework provides partitions of very high
quality for a moderate number of parts (about under 512),
but load imbalance dramatically increases for larger
numbers of parts. Also, the more parts the user wants, the
more expensive it is to compute them, because of the
recursive bisection process. In order to reduce load
imbalance in the recursive bipartitioning process, a
parallel load imbalance reduction algorithm has been
devised for the bipartitioning case. This algorithm yields
perfectly balanced subdomains, at almost no cost for mesh
graphs compared to direct k-way methods, while it may
significantly increase the cut for very irregular graphs.
Load imbalance reduction algorithms for the k-way case are
consequently mandatory, and are the objective of the year
to come. In spite of these drawbacks, and thanks to the
re-coding of some of its routines,
`PT-Scotch`can now partition graphs of above 2
billion vertices, a barrier that many users wanted to be
removed. For example, it has been able to provide perfectly
balanced partitions of distributed meshes of
1.6billion edges on 8096
processors at LLNL.

The second axis concerns dynamic repartitioning. Since
graphs may now comprise more than one billion vertices,
distributed on machines having more than one hundred
thousand processing elements, it is important to be able to
compute partitions which create as few data movements as
possible with respect to a prior partition. The integration
of repartitioning features into the sequential version of
`Scotch`is now complete, with very good results,
which are about to be published. The third year of the PhD
of Sébastien Fourestier aims at transposing these results
to the parallel case.

A third research axis regards the design of specific
graph partitioning algorithms. Several applications, such
as Schur complement methods for hybrid solvers (see
Section
), need k-way partitions where
load balance should take into account not only vertices
belonging to the sub-domains, but also boundary vertices,
which lead to computations on each of the sub-domains which
share them. A sequential version is now available as a
prototype, thanks to the work of Jun-Ho Her in the context
of the ANR project “PETAL”, and has been successfully used
in conjunction with the
`HIPS`solver. A paper is in preparation. The
transposition of these algorithms to the parallel case may
prove difficult. A new directions for this research is the
creation of other specific algorithms, in the context of a
collaboration with Sherry Li at Berkeley.

The fourth axis is the design of efficient and scalable
software tools for parallel dynamic remeshing. This is a
joint work with Cécile Dobrzynski, in the context of the
PhD of Cédric Lachat, funded by the
`PUMAS`team. PaMPA (“Parallel Mesh Partitioning and
Adaptation”) is a middleware library dedicated to the
management of distributed meshes. Its purpose is to relieve
solver writers from the tedious and error prone task of
writing again and again service routines for mesh handling,
data communication and exchange, remeshing, and data
redistribution. An API of the future platform has been
devised, and the coding of the mesh handling and
redistribution routines is in progress. As a direct
application of
`PaMPA`, Damien Genêt, who started his PhD this
fall, will write a new generation fluid dynamics solver on
top of this middleware.

New supercomputers incorporate many microprocessors
which include themselves one or many computational cores.
These new architectures induce strongly hierarchical
topologies. These are called NUMA architectures. In the
context of distributed NUMA architectures, a work has
begun, in collaboration with the INRIA RUNTIME team, to
study optimization strategies, and to improve the
scheduling of communications, threads and I/O. Sparse
direct solvers are a basic building block of many numerical
simulation algorithms. We propose to introduce a dynamic
scheduling designed for NUMA architectures in the
`PaStiX`solver. The data structures of the solver,
as well as the patterns of communication have been modified
to meet the needs of these architectures and dynamic
scheduling. We are also interested in the dynamic
adaptation of the computation grain to use efficiently
multi-core architectures and shared memory. Experiments on
several numerical test cases have been performed to prove
the efficiency of the approach on different architectures.
M. Faverge defended his Ph.D.
on these aspects in the context
of the NUMASIS ANR CIGC project.

In
`HIPS`, we propose several algorithmic variants to
solve the Schur complement system that can be adapted to
the geometry of the problem: typically some strategies are
more suitable for systems coming from a 2D problem
discretisation and others for a 3D problem; the choice of
the method also depends on the numerical difficulty of the
problem. We have a parallel version of HIPS that provides
full iterative methods as well as hybrid methods that mixes
a direct factorization inside the domain and an iterative
method in the Schur complement.

In , we have presented an hybrid version of the solver where the Schur complement preconditioner was built using parallel scalar ILUT algorithm. This year we have also developed a parallel version of the algorithms where the Schur complement incomplete factorization is done using a dense block structure. That is to say that there is no additional term dropping in the Schur complement preconditioner other than the ones prescribed by the block pattern defined by the HID graph partitioning. This variant of the preconditioner is more expensive in term of memory but for some difficult test cases they are the only alternative to direct solvers. A general comparison of all the hybrid methods in HIPS has been presented in .

J. Gaidamour defended his Ph.D. on the hybrid solver techniques developed in HIPS.

This year we have also defined a general programming
interface for sparse linear solvers. Our goal is to
normalize the API of sparse linear solvers and to provide
some very simple ways of doing some fastidious tasks such
as parallel matrix assembly for instance. We have thus
proposed a generic API specifications called MURGE (
http://
`HIPS`and
`PaStiX`. We have also tested this interface in
`RealfluiDS`and
`JOREK`for
`HIPS`and
`PaStiX`.

New supercomputers incorporate many microprocessors which include themselves one or many computational cores. These new architectures induce strongly hierarchical topologies. On one hand, we have introduced a dynamic scheduling designed for these architectures in the PaStiX solver. On the other hand, we have a parallel version of HIPS that provides full iterative methods as well as hybrid methods that mixes a direct factorization inside the domain and an iterative method in the Schur complement. Moreover, graphs or meshes partitioners (Scotch software for instance) are able to deal with problems that have more than several billion of unknowns. Solving linear systems is clearly the limiting step to reach this challenge in numerical simulations. An important aim for this work is the design and the implementation of a sparse linear solver that can exploit the power of this new supercomputers. We will have to propose solutions for the following problems:

full parallel and scalable preprocessing steps (ordering and symbolic factorization);

efficient algorithmic coupling of direct and iterative methods that allow a powerful management of whole the levels of parallelism;

adapted scheduling of computation tasks to take advantage of the runtime that operates on mixed architectures with multi-cores and GPUs.

**Dates:**2008-2011

Transfer and development of the Residual Distribution schemes in the Natur code (in collaboration with INCKA).

**Dates:**2008-2011

Study and validation of very high order SUPG schemes in AETHER.

**Grant:**ANR-06-CIS

**Dates:**2006 – 2010

**Partners:**CEA Cadarache.

**Overview:**The magneto-hydrodynamic instability called
ELM for Edge Localized Mode is commonly observed in the
standard tokamak operating scenario. The energy losses the
ELM will induce in ITER plasmas are a real concern.
However, the current understanding of what sets the size of
these ELM induced energy losses is extremely limited. No
numerical simulations of the complete ELM instability, from
its onset through its non-linear phase and its decay, exist
in literature. Recently, encouraging results on the
simulation of an ELM cycle have been obtained with the
`JOREK`code developed at CEA but at reduced toroidal
resolution. The
`JOREK`code uses a fully implicit time evolution
scheme in conjunction with the
`PaStiX`sparse matrix library. In this project it is
proposed to develop and implement methods to improve the
MHD simulation code to enable high-resolution MHD
simulations of ELMs. The ELM simulations are urgently
needed to improve our understanding of ELMs and to evaluate
possible mechanism to control the energy losses. The
improvements include adaptive mesh refinement, a robust
numerical MHD scheme and refinable cubic Hermite finite
elements. These developments need to be consistent with the
implicit time evolution scheme and the
`PaStiX`solver. The implicit scheme is essential due
to the large variety of time scales in the MHD simulations.
The new methods will be implemented and evaluated in the
code
`RealfluiDS`, developed by the BACCHUS team and the
`JOREK`code to optimize the exchange of expertise on
numerical methods and MHD simulations.

The project is a collaboration between the Departement de Recherche sur la Fusion Controlée (DRFC, CEA/Cadarache) and the Laboratoire Bordelais de Recherche en Informatique (LaBRI) and Mathématiques Appliquées de Bordeaux (IMB) at the University of Bordeaux 1.

**Grant:**ANR Cosinus 2008

**Dates:**2009–2011

**Partners:**INRIA Saclay-Ile de France (leader of the
project), Paris 6, IFP (Rueil-Malmaison), CEA Saclay

**Overview:**In this collaborative effort, we propose to
develop parallel preconditioning techniques for the
emergent hierarchical models of clusters of multi-core
processors, as used for example in future petascale
machines. The preconditioning techniques are based on
recent progress obtained in combining the well known
incomplete LU (ILU) factorization with the tangential
filtering, another incomplete factorization where a
filtering condition is satisfied. The goal of this project
is to transform these preconditioners into black box
parallel preconditioners that could be as usable as
standard and popular methods such as ILU. For this, we
address several issues related to the quality of the
combined preconditioner. We also aim to make the connection
of these methods with the domain decomposition methods. To
obtain a preconditioner suitable for parallelism, we will
study associated graph partitioning and reordering
techniques.

**Grant:**European Commission

**Dates:**2010-2013

**Partners:**DASSAULT, DLR, ONERA, NLR, ARA, VKI, INRIA,
, Universities of Stuttgart, Bergame, Twente, Nottingham,
Swansea, Charles (Prague), Varsovie, CENAERO, ENSAM
Paris)

**Overview:**Computational Fluid Dynamics is a key
enabler for meeting the strategic goals of future air
transportation. However, the limitations of today numerical
tools reduce the scope of innovation in aircraft
development, keeping aircraft design at a conservative
level. Within the 7th European Research Framework
Programme, the strategic target research project IDHIOM has
been initiated. The goal of IDHIOM is the industrialization
of innovative adaptive higher-order methods for the
compressible flow equations enabling reliable, mesh
independent numerical solutions for large-scale aerodynamic
applications in aircraft design. A critical assessment of
the newly developed methods for industrial aerodynamic
applications will allow the identification of the best
numerical strategies for integration as major building
blocks for the next generation of industrial flow solvers.
In order to meet the ambitious objectives, a partnership of
22 organizations from universities, research organizations
and aerospace industry from 10 countries with well proved
expertise in CFD has been set up guaranteeing high level
research work with a clear path to industrial
exploitation.

**Web:**
http://

**Grant:**European Commission

**Dates:**2009-2014

The numerical simulation of complex compressible flow problem is still a challenge nowadays, even for the simplest physical model such as the Euler and Navier Stokes equations for perfect gases. Researchers in scientific computing need to understand how to obtain efficient, stable, very accurate schemes on complex 3D geometries that are easy to code and to maintain, with good scalability on massively parallel machines. Many people work on these topics, but our opinion is that new challenges have to be tackled in order to combine the outcomes of several branches of scientific computing to get simpler algorithms of better quality without sacrificing their efficiency properties. In this proposal, we will tackle several hard points to overcome for the success of this program.

We first consider the problem of how to design methods that can handle easily mesh refinement, in particular near the boundary, the locations where the most interesting engineering quantities have to be evaluated. CAD tools enable to describe the geometry, then a mesh is generated which itself is used by a numerical scheme. Hence, any mesh refinement process is not directly connected with the CAD. This situation prevents the spread of mesh adaptation techniques in industry and we propose a method to overcome this even for steep problems.

Second, we consider the problem of handling the extremely complex patterns that occur in a flow because of boundary layers: it is not always sufficient to only increase the number of degrees of freedom or the formal accuracy of the scheme. We propose to overcome this with class of very high order numerical schemes that can utilise solution dependent basis functions.

Our third item is about handling unsteady uncertainties in the model, for example in the geometry or the boundary conditions. This need to be done efficiently: the amount of computation increases a priori linearly with the number of uncertain parameters. We propose a non–intrusive method that is able to deal with general probability density functions (pdf), and also able to handle pdfs that may evolve during the simulation via a stochastic optimization algorithm, for example. This will be combined with the first two items of this proposal. Many random variables may be needed, the curse of dimensionality will be dealt thanks to multiresolution method combined with sparse grid methods.

The aim of this proposal is to design, develop and evaluate solutions to each of these challenges. Currently, and up to our knowledge, none of these problems have been dealt with for compressible flows with steep patterns as in many moderns aerodynamics industrial problems. We propose a work program that will lead to significant breakthroughs for flow simulations with a clear impact on numerical schemes and industrial applications. Our solutions, though developed and evaluated on flow problems, have a wider potential and could be considered for any physical problem that are essentially hyperbolic.

Rémi Abgrall is associate editor of the international journals “Mathematical of Computation”, “Computer and Fluids”, “Journal of Computational Physics”, “Journal of Scientific Computing” and “Journal of Computing Science and Mathematics”. He is member of the scientific committee of the international conference ICCFD, and of the “commission d'évaluation de la direction Simulation Numérique en aérodynamique” of ONERA. He is member of the CFD committee of ECOMAS. He is also member of the scientific committee of CERFACS. He is member of the GP1 group of Allistène. He is member of the Comité National du CNRS, section 01. He is member of the board of the GAMNI group of SMAI and is its current responsible. He is member of the board of Institut Polytechnique de Bordeaux.

Pierre Ramet and Rémi Abgrall are members of the GENCI scientific committee (Mathematics and Computer Sciences). R. Abgrall also belongs to the Fluid mechanics one.

Pascal Hénon, François Pellegrini and Pierre Ramet have been members of the “commission consultative” for the LaBRI in 2010.

Pascal Hénon was also in the decision board for the
“Plafrim” project, and Pierre Ramet was in the decision board
of the "MCIA" project (
*Mésocentre Aquitain : un environnement Mutualisé de Calcul
Intensif en Aquitaine*).

Mario Ricchiuto, Cécile Dobrzynski and Rémi Abgrall have, in collaboration with the team–project MC2, prepared the CANUM 2010 in Carcan Maubuisson (june 2010).

Rémi Abgrall is responsible of the interim period (3rd year) of the MATMECA department of ENSEIRB-MATMECA.

Cécile Dobrzynski teached during the Summer School CEA/EDF/INRIA on the topic of Mesh generation.

In complement of the normal teaching activity of the university members and of IPB members, Pascal Hénon teaches at IPB (computer science engineering school).

François Pellegrini teaches every year a master 2
class at ENSEIRB-MATMÉCA on the architecture of
high-performance systems and ways to exploit them, in the
context of practical projects. This year, he also gave a
tutorial on MPI parallel programming and graph partitioning
with
`Scotch`, during the “
*Séminaire de l'école d'été CEMRACS'10 – Modèles numériques
pour la fusion*” at Luminy.

Pierre Ramet gives a master 2 class on parallel numerical algorithms at IPB (computer science engineering school). He also teached on the topic of sparse linear solvers during the summer school at CEMRACS (Marseille), and during a workshop organized by the CNRS (Lyon).

Mario Ricchiuto has given lectures in the “Mastère Spécialisé en Ingénierie Aéronautique et Spatiale” organized by ENSAM, ENSEIRB-MATMÉCA, Institut de Cognitique and several local industrial partners. He also teaches at ENSEIRB-MATMÉCA.