## Section: New Results

### Shared-memory parallelism

#### Algorithms and data structures for parallel computing

Participants : Umut Acar, Arthur Charguéraud [EPI Toccata] , Mike Rainey.

The ERC Deepsea project, with principal investigator Umut Acar, started in June 2013 and is hosted by the Gallium team. This project aims at developing techniques for parallel and self-adjusting computations in the context of shared-memory multiprocessors (i.e., multicore platforms). The project is continuing work that began at Max Planck Institute for Software Systems between 2010 and 2013. As part of this project, we are developing a C++ library, called PASL, for programming parallel computations at a high level of abstraction. We use this library to evaluate new algorithms and data structures. We obtained two major results this year.

The first result is a sequence data structure that provides amortized
constant-time access at the two ends, and logarithmic time
concatenation and splitting at arbitrary positions. These operations
are essential for programming efficient computation in the fork-join
model. Compared with prior work, this novel sequence data structure
achieves excellent constant factors, allowing it to be used as a
replacement for traditional, non-splittable sequence data
structures. This data structure, called *chunked sequence* due to its
use of chunks (fixed-capacity arrays), has been implemented both in
C++ and in OCaml, and shown competitive with state-of-the art sequence
data structures that do not support split and concatenation
operations. This work is described in a paper published at ESA
[22] .

A second main result is the development of fast and robust parallel graph traversal algorithms, more precisely for parallel BFS and parallel DFS. The new algorithms leverage the aformentioned sequence data structure for representing the set of edges remaining to be visited. In particular, it uses the split operation for balancing the edges among the several processors involved in the computation. Compared with prior work, these new algorithms are designed to be efficient not just for particular classes of graphs, but for all input graphs. This work has not yet been published, however it is described in details in a technical report [46] .

#### Weak memory models

Participants : Luc Maranget, Jacques-Pascal Deplaix, Jade Alglave [University College London, then Microsoft Research, Cambridge] .

Modern multi-core and multi-processor computers do not follow the intuitive “Sequential Consistency” model that would define a concurrent execution as the interleaving of the execution of its constituting threads and that would command instantaneous writes to the shared memory. This situation is due both to in-core optimisations such as speculative and out-of-order execution of instructions, and to the presence of sophisticated (and cooperating) caching devices between processors and memory.

In the last few years, Luc Maranget took part in an international
research effort to define the semantics of the computers of the
multi-core era. This research effort relies both on formal methods
for defining the models and on intensive experiments for validating
the models. Joint work with, amongst others, Jade Alglave
(now at Microsoft Research, Cambridge), Peter Sewell
(University of Cambridge) and Susmit Sarkar (University of St. Andrews)
achieved several significant results, including
two semantics for the IBM Power and ARM memory models: one of the operational
kind [70] and the other of the axiomatic
kind [64] . In particular, Luc Maranget is the main
developer of the **diy** tool suite (see section
5.3 ). Luc
Maranget also performs most of the experiments involved.

In 2014 we produced a new model for Power/ARM. The new model is
simpler than the previous ones, in the sense that it is based
on fewer mathematical objects and can be simulated more efficiently
than the previous models.
The new **herd** simulator (part of **diy** tool suite) is in fact
a generic simulator, whose central component is an interpreter for
a domain-specific language. More precisely,
memory models are described in a simple language that defines relations
by means of a few operators such as concatenation, transitive closure,
fixpoint, etc., and performs validity checks on relations such as
acyclicity. The Power/ARM model consists of about 50 lines of this
specific language. This work, with additional material, including in-depth
testing of ARM devices and data-mining of potential concurrency bugs in
a huge code base, was published in the journal
*Transaction on Programming Languages and
Systems* [13]
and selected for presentation
at the PLDI conference [23] .
Luc Maranget gave this presentation.

In the same research theme, Luc Maranget supervised the internship of
Jacques-Pascal Deplaix (EPITECH), from Oct. 2013 to May 2014.
Jacques-Pascal extended **litmus**, our tool to run tests on
hardware.
**litmus** now accepts test written in C;
we can now perform the conformance testing of C compilers
and machines with respect to the C11/C++11 standard.
Namely, Mark Batty (University of Cambridge), under the
supervision of Jade Alglave, wrote a **herd** model for this standard.
The new **litmus** also proves useful to run tests that exploit
some machine idiosyncrasies, when our **litmus** assembly implementation
does not handle them.

As a part of the **litmus** infrastructure,
Luc Maranget designed a synchronisation barrier primitive by simplifying
the sense synchronisation barrier published by Maurice Herlily and Nir
Shavit in their textbook [58] .
He co-authored a JFLA article [34] ,
that presents this primitive and proves it correct automatically by
the means of the **cubicle** tool developed under the supervision
of Sylvain Conchon (team Toccata, Inria Saclay).