2020
Activity report
Project-Team
DATASHAPE
RNSR: 201622050C
Research centers
In partnership with:
CNRS, Université Paris-Saclay
Team name:
Understanding the shape of data
In collaboration with:
Laboratoire de mathématiques d'Orsay de l'Université de Paris-Sud (LMO)
Domain
Algorithmics, Programming, Software and Architecture
Theme
Algorithmics, Computer Algebra and Cryptology
Creation of the Team: 2016 January 01, updated into Project-Team: 2017 July 01

Keywords

• A3. Data and knowledge
• A3.4. Machine learning and statistics
• A7.1. Algorithms
• A8. Mathematics of computing
• A8.1. Discrete mathematics, combinatorics
• A8.3. Geometry, Topology
• A9. Artificial intelligence
• B1. Life sciences
• B2. Health
• B5. Industry of the future
• B9. Society and Knowledge
• B9.5. Sciences

1 Team members, visitors, external collaborators

Research Scientists

• Frédéric Chazal [Team leader, Inria, Senior Researcher, Saclay - Île-de-France, HDR]
• Jean-Daniel Boissonnat [Inria, Senior Researcher, Sophia Antipolis - Méditerranée, HDR]
• Mathieu Carrière [Inria, Researcher, from Oct 2020, Sophia Antipolis - Méditerranée]
• David Cohen-Steiner [Inria, Researcher, Sophia Antipolis - Méditerranée]
• Marc Glisse [Inria, Researcher, Saclay - Île-de-France]
• Jisu Kim [Inria, Starting Research Position, from Mar 2020, Saclay - Île-de-France]
• Clément Maria [Inria, Researcher, Sophia Antipolis - Méditerranée]
• Steve Oudot [Inria, Researcher, Saclay - Île-de-France, HDR]

Faculty Members

• Gilles Blanchard [Univ Paris-Saclay, from Nov 2020, Saclay - Île-de-France]
• Blanche Buet [Univ Paris-Saclay, from Nov 2020, Saclay - Île-de-France]
• Pierre Pansu [Univ Paris-Saclay, Professor, from Nov 2020, Saclay - Île-de-France]

Post-Doctoral Fellows

• Kristof Huszar [Inria, from Oct 2020, Sophia Antipolis - Méditerranée]
• Hariprasad Kannan [Inria, until Jan 2020, Saclay - Île-de-France]
• Jisu Kim [Inria, until Feb 2020, Saclay - Île-de-France]
• Theo Lacombe [Inria, from Oct 2020, Saclay - Île-de-France]
• Siddharth Pritam [Inria, from Oct 2020, Sophia Antipolis - Méditerranée]
• Martin Royer [Inria, until Aug 2020, Saclay - Île-de-France]

PhD Students

• Bertrand Beaufils [SYSNAV, CIFRE, until Jun 2020, Saclay - Île-de-France]
• Nicolas Berkouk [Inria, until Oct 2020, Saclay - Île-de-France]
• Jeremie Capitao-Miniconi [Univ Paris-Saclay, from Nov 2020, Saclay - Île-de-France]
• Alex Delalande [Inria, Saclay - Île-de-France]
• Vincent Divol [Univ Paris-Sud, Saclay - Île-de-France]
• Olympio Hacquard [Univ Paris-Saclay, from Sep 2020, Saclay - Île-de-France]
• Theo Lacombe [École polytechnique, until Sep 2020, Saclay - Île-de-France]
• Etienne Lasalle [Univ Paris-Sud, Saclay - Île-de-France]
• Vadim Lebovici [École Normale Supérieure de Paris, from Sep 2020, Saclay - Île-de-France]
• Daniel Perez [Ecole normale supérieure Paris-Saclay, from Nov 2020, Saclay - Île-de-France]
• Siddharth Pritam [Inria, until Sep 2020, Sophia Antipolis - Méditerranée]
• Louis Pujol [Univ Paris-Sud, Saclay - Île-de-France]
• Wojciech Reise [Inria, from Sep 2020, Saclay - Île-de-France]
• Owen Rouille [Inria, Sophia Antipolis - Méditerranée]
• Raphael Tinarrage [École Normale Supérieure de Cachan, Saclay - Île-de-France]
• Christophe Vuong [Telecom ParisTech, from Nov 2020, Saclay - Île-de-France]

Technical Staff

• Thomas Bonis [Inria, Engineer, until Jul 2020, Saclay - Île-de-France]
• Rudresh Mishra [Inria, Engineer, from Dec 2020, Saclay - Île-de-France]
• Vincent Rouvreau [Inria, Engineer, Saclay - Île-de-France]

Interns and Apprentices

• Antoine Commaret [École Normale Supérieure de Paris, from Sep 2020, Saclay - Île-de-France]

• Laurence Fontana [Inria, from Oct 2020, Saclay - Île-de-France]
• Sophie Honnorat [Inria, Sophia Antipolis - Méditerranée]

External Collaborators

• Clément Levrard [Univ Denis Diderot, until Sep 2020, Saclay - Île-de-France]
• Bertrand Michel [Univ Pierre et Marie Curie, Saclay - Île-de-France]

2 Overall objectives

DataShape is a research project in Topological Data Analysis (TDA), a recent field whose aim is to uncover, understand and exploit the topological and geometric structure underlying complex and possibly high dimensional data. The overall objective of the DataShape project is to settle the mathematical, statistical and algorithmic foundations of TDA and to disseminate and promote our results in the data science community.

The approach of DataShape relies on the conviction that it is necessary to combine statistical, topological/geometric and computational approaches in a common framework, in order to face the challenges of TDA. Another conviction of DataShape is that TDA needs to be combined with other data sciences approaches and tools to lead to successful real applications. It is necessary for TDA challenges to be simultaneously addressed from the fundamental and application sides.

The team members have actively contributed to the emergence of TDA during the last few years. The variety of expertise, going from fundamental mathematics to software development, and the strong interactions within our team as well as numerous well established international collaborations make our group one of the best to achieve these goals.

The expected output of DataShape is two-fold. First, we intend to set-up and develop the mathematical, statistical and algorithmic foundations of Topological and Geometric Data Analysis. Second, we intend to pursue the development of the GUDHI platform, initiated by the team members and which is becoming a standard tool in TDA, in order to provide an efficient state-of-the-art toolbox for the understanding of the topology and geometry of data. The ultimate goal of DataShape is to develop and promote TDA as a new family of well-founded methods to uncover and exploit the geometry of data. This also includes the clarification of the position and complementarity of TDA with respect to other approaches and tools in data science. Our objective is also to provide practically efficient and flexible tools that could be used independently, complementarily or in combination with other classical data analysis and machine learning approaches.

3 Research program

3.1 Algorithmic aspects and new mathematical directions for topological and geometric data analysis

tda requires to construct and manipulate appropriate representations of complex and high dimensional shapes. A major difficulty comes from the fact that the complexity of data structures and algorithms used to approximate shapes rapidly grows as the dimensionality increases, which makes them intractable in high dimensions. We focus our research on simplicial complexes which offer a convenient representation of general shapes and generalize graphs and triangulations. Our work includes the study of simplicial complexes with good approximation properties and the design of compact data structures to represent them.

In low dimensions, effective shape reconstruction techniques exist that can provide precise geometric approximations very efficiently and under reasonable sampling conditions. Extending those techniques to higher dimensions as is required in the context of tda is problematic since almost all methods in low dimensions rely on the computation of a subdivision of the ambient space. A direct extension of those methods would immediately lead to algorithms whose complexities depend exponentially on the ambient dimension, which is prohibitive in most applications. A first direction to by-pass the curse of dimensionality is to develop algorithms whose complexities depend on the intrinsic dimension of the data (which most of the time is small although unknown) rather than on the dimension of the ambient space. Another direction is to resort to cruder approximations that only captures the homotopy type or the homology of the sampled shape. The recent theory of persistent homology provides a powerful and robust tool to study the homology of sampled spaces in a stable way.

3.2 Statistical aspects of topological and geometric data analysis

The wide variety of larger and larger available data - often corrupted by noise and outliers - requires to consider the statistical properties of their topological and geometric features and to propose new relevant statistical models for their study.

There exist various statistical and machine learning methods intending to uncover the geometric structure of data. Beyond manifold learning and dimensionality reduction approaches that generally do not allow to assert the relevance of the inferred topological and geometric features and are not well-suited for the analysis of complex topological structures, set estimation methods intend to estimate, from random samples, a set around which the data is concentrated. In these methods, that include support and manifold estimation, principal curves/manifolds and their various generalizations to name a few, the estimation problems are usually considered under losses, such as Hausdorff distance or symmetric difference, that are not sensitive to the topology of the estimated sets, preventing these tools to directly infer topological or geometric information.

Regarding purely topological features, the statistical estimation of homology or homotopy type of compact subsets of Euclidean spaces, has only been considered recently, most of the time under the quite restrictive assumption that the data are randomly sampled from smooth manifolds.

In a more general setting, with the emergence of new geometric inference tools based on the study of distance functions and algebraic topology tools such as persistent homology, computational topology has recently seen an important development offering a new set of methods to infer relevant topological and geometric features of data sampled in general metric spaces. The use of these tools remains widely heuristic and until recently there were only a few preliminary results establishing connections between geometric inference, persistent homology and statistics. However, this direction has attracted a lot of attention over the last three years. In particular, stability properties and new representations of persistent homology information have led to very promising results to which the DataShape members have significantly contributed. These preliminary results open many perspectives and research directions that need to be explored.

Our goal is to build on our first statistical results in tda to develop the mathematical foundations of Statistical Topological and Geometric Data Analysis. Combined with the other objectives, our ultimate goal is to provide a well-founded and effective statistical toolbox for the understanding of topology and geometry of data.

3.3 Topological and geometric approaches for machine learning

This objective is driven by the problems raised by the use of topological and geometric approaches in machine learning. The goal is both to use our techniques to better understand the role of topological and geometric structures in machine learning problems and to apply our tda tools to develop specialized topological approaches to be used in combination with other machine learning methods.

3.4 Experimental research and software development

We develop a high quality open source software platform called gudhi which is becoming a reference in geometric and topological data analysis in high dimensions. The goal is not to provide code tailored to the numerous potential applications but rather to provide the central data structures and algorithms that underlie applications in geometric and topological data analysis.

The development of the gudhi platform also serves to benchmark and optimize new algorithmic solutions resulting from our theoretical work. Such development necessitates a whole line of research on software architecture and interface design, heuristics and fine-tuning optimization, robustness and arithmetic issues, and visualization. We aim at providing a full programming environment following the same recipes that made up the success story of the cgal  library, the reference library in computational geometry.

Some of the algorithms implemented on the platform will also be interfaced to other software platform, such as the R software 1 for statistical computing, and languages such as Python in order to make them usable in combination with other data analysis and machine learning tools. A first attempt in this direction has been done with the creation of an R package called TDA in collaboration with the group of Larry Wasserman at Carnegie Mellon University (INRIA Associated team CATS) that already includes some functionalities of the gudhi library and implements some joint results between our team and the CMU team. A similar interface with the Python language is also considered a priority. To go even further towards helping users, we will provide utilities that perform the most common tasks without requiring any programming at all.

4 Application domains

Our work is mostly of a fundamental mathematical and algorithmic nature but finds a variety of applications in data analysis, e.g., in material science, biology, sensor networks, 3D shape analysis and processing, to name a few.

More specifically, DataShape is working on the analysis of trajectories obtained from inertial sensors (PhD thesis of Bertrand Beaufils with Sysnav) and, more generally on the development of new TDA methods for Machine Learning and Artificial Intelligence for (multivariate) time-dependent data from various kinds of sensors in collaboration with Fujitsu.

DataShape is also working in collaboration with the University of Columbia in New-York, especially with the Rabadan lab, in order to improve bioinformatics methods and analyses for single cell genomic data. For instance, there is a lot of work whose aim is to use TDA tools such as persistent homology and the Mapper algorithm to characterize, quantify and study statistical significance of biological phenomena that occur in large scale single cell data sets. Such biological phenomena include, among others: the cell cycle, functional differentiation of stem cells, and immune system responses (such as the spatial response on the tissue location, and the genomic response with protein expression) to breast cancer.

5 Social and environmental responsibility

5.1 Footprint of research activities

The weekly research seminar of DataShape is now taking place online, and travels for the team members have decreased a lot this year, mainly because of the COVID-19 pandemic.

6 New software and platforms

6.1 New software

6.1.1 GUDHI

• Name: Geometric Understanding in Higher Dimensions
• Keywords: Computational geometry, Topology, Clustering
• Scientific Description:

The Gudhi library is an open source library for Computational Topology and Topological Data Analysis (TDA). It offers state-of-the-art algorithms to construct various types of simplicial complexes, data structures to represent them, and algorithms to compute geometric approximations of shapes and persistent homology.

The GUDHI library offers the following interoperable modules:

. Complexes: + Cubical + Simplicial: Rips, Witness, Alpha and Čech complexes + Cover: Nerve and Graph induced complexes . Data structures and basic operations: + Simplex tree, Skeleton blockers and Toplex map + Construction, update, filtration and simplification . Topological descriptors computation . Manifold reconstruction . Topological descriptors tools: + Bottleneck and Wasserstein distance + Statistical tools + Persistence diagram and barcode

• Functional Description: The GUDHI open source library will provide the central data structures and algorithms that underly applications in geometry understanding in higher dimensions. It is intended to both help the development of new algorithmic solutions inside and outside the project, and to facilitate the transfer of results in applied fields.
• News of the Year: - DTM Rips complex - Edge Collapse - Time delay embedding - Clustering (ToMaTo) - Atol - Persistence representations - Weighted alpha complex - Subsampling - Periodic (weighted or not) 3d Alpha complex - pip packages
• URL:
• Authors: Clément Maria, Jean-Daniel Boissonnat, Marc Glisse, Mariette Yvinec, Vincent Rouvreau, Clément Jamin, David Salinas, François Godi, Mathieu Carrière, Pawel Dlotko, Siargey Kachanovich, Siddharth Pritam, Theo Lacombe, Steve Oudot, Bertrand Michel, Frédéric Chazal
• Contacts: Jean-Daniel Boissonnat, Marc Glisse, Vincent Rouvreau
• Participants: Clément Maria, François Godi, David Salinas, Jean-Daniel Boissonnat, Marc Glisse, Mariette Yvinec, Pawel Dlotko, Siargey Kachanovich, Vincent Rouvreau, Mathieu Carrière, Bertrand Michel, Clément Jamin, Siddharth Pritam, Theo Lacombe, Frédéric Chazal, Steve Oudot

6.1.2 Module CGAL: New dD Geometry Kernel

• Keyword: Computational geometry
• Functional Description: This package of CGAL (Computational Geometry Algorithms Library) provides the basic geometric types (point, vector, etc) and operations (orientation test, etc) used by geometric algorithms in arbitrary dimension. It uses filters for efficient exact predicates.
• Release Contributions: New predicates for (weighted) alpha complexes, performance improvements.
• URL:
• Author: Marc Glisse
• Contact: Marc Glisse

7 New results

7.1 Algorithmic aspects and new mathematical directions for topological and geometric data analysis

7.1.1 Lexicographic optimal homologous chains and applications to point cloud triangulations

Participants: David Cohen-Steiner.

In collaboration with André Lieutier (Dassault Systèmes) and Julien Vuillamy (Titane team, Inria Sophia-Antipolis).

This work 30 considers a particular case of the Optimal Homologous Chain Problem (OHCP),where optimality is meant as a minimal lexicographic order on chains induced by a total or-der on simplices. The matrix reduction algorithm used for persistent homology is used toderive polynomial algorithms solving this problem instance, whereas OHCP is NP-hard inthe general case. The complexity is further improved to a quasilinear algorithm by leveraginga dual graph minimum cut formulation when the simplicial complex is a strongly connectedpseudomanifold. We then show how this particular instance of the problem is relevant, byproviding an application in the context of point cloud triangulation

7.1.2 Tracing isomanifolds in ${R}^{d}$ in time polynomial in $d$

Participants: Jean-Daniel Boissonnat, Siargey Kachanovich.

In collaboration with Mathijs Wintraecken (IST Austria).

Isomanifolds are the generalization of isosurfaces to arbitrary dimension and codimension, i.e. submanifolds of ${ℝ}^{d}$ defined as the zero set of some multivariate multivalued smooth function $f:{ℝ}^{d}\to {ℝ}^{d-n}$, where $n$ is the intrinsic dimension of the manifold. A natural way to approximate a smooth isomanifold $M$ is to consider its Piecewise-Linear (PL) approximation $\stackrel{^}{M}$ based on a triangulation $𝒯$ of the ambient space ${ℝ}^{d}$. In 36 , we describe a simple algorithm to trace isomanifolds from a given starting point. The algorithm works for arbitrary dimensions $n$ and $d$, and any precision $D$. Our main result is that, when $f$ (or $M$) has bounded complexity, the complexity of the algorithm is polynomial in $d$ and $\delta =1/D$ (and unavoidably exponential in $n$). Since it is known that for $\delta =\Omega \left({d}^{2.5}\right)$, $\stackrel{^}{M}$ is $O\left({D}^{2}\right)$-close and isotopic to $M$, our algorithm produces a faithful PL-approximation of isomanifolds of bounded complexity in time polynomial in $d$. Combining this algorithm with dimensionality reduction techniques, the dependency on $d$ in the size of $\stackrel{^}{M}$ can be completely removed with high probability. We also show that the algorithm can handle isomanifolds with boundary and, more generally, isostratifolds. The algorithm has been implemented and experimental results are reported, showing that it is practical and can handle cases that are far ahead of the state-of-the-art.

7.1.3 A compact data structure for high dimensional Coxeter-Freudenthal-Kuhn triangulations

Participants: Jean-Daniel Boissonnat, Siargey Kachanovich.

In collaboration with Mathijs Wintraecken (IST Austria).

In 45, we consider a family of highly regular triangulations of ${ℝ}^{d}$ that can be stored and queried efficiently in high dimensions. This family consists of Freudenthal-Kuhn triangulations and their images through affine mappings, among which are the celebrated Coxeter triangulations of type ${\stackrel{˜}{A}}_{d}$. Those triangulations have major advantages over grids in applications in high dimensions like interpolation of functions and manifold sampling and meshing. We introduce an elegant and very compact data structure to implicitly store the full facial structure of such triangulations. This data structure allows to locate a point and to retrieve the faces or the cofaces of a simplex of any dimension in an output sensitive way. The data structure has been implemented and experimental results are presented.

7.1.4 Local characterizations for decomposability of 2-parameter persistence modules

In collaboration with Magnus Botnan (Vrije Universiteit Amsterdam)

In this work 48 we investigate the existence of sufficient local conditions under which representations of a given poset will be guaranteed to decompose as direct sums of indecomposables from a given class. Our indecomposables of interest belong to the so-called interval modules, which by definition are indicator representations of intervals in the poset. In contexts where the poset is the product of two totally ordered sets (which corresponds to the setting of 2-parameter persistence in topological data analysis), we show that the whole class of interval modules itself does not admit such a local characterization, even when locality is understood in a broad sense. By contrast, we show that the subclass of rectangle modules does admit such a local characterization, and furthermore that it is, in some precise sense, the largest subclass to do so.

7.1.5 On rectangle-decomposable 2-parameter persistence modules

In collaboration with Magnus Botnan (Vrije Universiteit Amsterdam)

This work 28 addresses two questions: (a) can we identify a sensible class of 2-parameter persistence modules on which the rank invariant is complete? (b) can we determine efficiently whether a given 2-parameter persistence module belongs to this class? We provide positive answers to both questions, and our class of interest is that of rectangle-decomposable modules. Our contributions include: on the one hand, a proof that the rank invariant is complete on rectangle-decomposable modules, together with an inclusion-exclusion formula for counting the multiplicities of the summands; on the other hand, algorithms to check whether a module induced in homology by a bifiltration is rectangle-decomposable, and to decompose it in the affirmative, with a better complexity than state-of-the-art decomposition methods for general 2-parameter persistence modules. Our algorithms are backed up by a new structure theorem, whereby a 2-parameter persistence module is rectangle-decomposable if, and only if, its restrictions to squares are. This local characterization is key to the efficiency of our algorithms, and it generalizes previous conditions derived for the smaller class of block-decomposable modules. It also admits an algebraic formulation that turns out to be a weaker version of the one for block-decomposability. By contrast, we show that general interval-decomposability does not admit such a local characterization, even when locality is understood in a broad sense. Our analysis focuses on the case of modules indexed over finite grids.

7.1.6 Decomposition of exact pfd persistence bimodules

Participants: Jérémy Cochoy, Steve Oudot.

In this work 22 we characterize the class of persistence modules indexed over R2 that are decomposable into summands whose support have the shape of a block—i.e. a horizontal band, a vertical band, an upper-right quadrant, or a lower-left quadrant. Assuming the modules are pointwise finite dimensional (pfd), we show that they are decomposable into block summands if and only if they satisfy a certain local property called exactness. Our proof follows the same scheme as the proof of decomposition for pfd persistence modules indexed over R, yet it departs from it at key stages due to the product order on R2 not being a total order, which leaves some important gaps open. These gaps are filled in using more direct arguments. Our work is motivated primarily by the stability theory for zigzags and interlevel-sets persistence modules, in which block-decomposable bimodules play a key part. Our results allow us to drop some of the conditions under which that theory holds, in particular the Morse-type conditions.

7.1.7 Homotopy Reconstruction via the Cech Complex and the Vietoris-Rips Complex

Participants: Frédéric Chazal, Jisu Kim.

In collaboration with J. Shin, A. Rinaldo, L. Wasserman (Carnegie Mellon University)

In this work 33, we derive conditions under which the reconstruction of a target space is topologically correct via the Čech complex or the Vietoris-Rips complex obtained from possibly noisy point cloud data. We provide two novel theoretical results. First, we describe sufficient conditions under which any non-empty intersection of finitely many Euclidean balls intersected with a positive reach set is contractible, so that the Nerve theorem applies for the restricted Čech complex. Second, we demonstrate the homotopy equivalence of a positive $\mu$-reach set and its offsets. Applying these results to the restricted Čech complex and using the interleaving relations with the Čech complex (or the Vietoris-Rips complex), we formulate conditions guaranteeing that the target space is homotopy equivalent to the Čech complex (or the Vietoris-Rips complex), in terms of the $\mu$-reach. Our results sharpen existing results.

7.1.8 Recovering the homology of immersed manifolds

Participants: Raphaël Tinarrage.

Given a sample of an abstract manifold immersed in some Euclidean space, we describe 68 a way to recover the singular homology of the original manifold. It consists in estimating its tangent bundle—seen as subset of another Euclidean space—in a measure theoretic point of view, and in applying measure-based filtrations for persistent homology. The construction we propose is consistent and stable, and does not involve the knowledge of the dimension of the manifold. In order to obtain quantitative results, we introduce the normal reach, which is a notion of reach suitable for an immersed manifold.

7.1.9 Computing persistent Stiefel-Whitney classes of line bundles

Participants: Raphaël Tinarrage.

We propose 67 a definition of persistent Stiefel-Whitney classes of vector bundle filtrations. It relies on seeing vector bundles as subsets of some Euclidean spaces. The usual Čech filtration of such a subset can be endowed with a vector bundle structure, that we call a Čech bundle filtration. We show that this construction is stable and consistent. When the dataset is a finite sample of a line bundle, we implement an effective algorithm to compute its persistent Stiefel-Whitney classes. In order to use simplicial approximation techniques in practice, we develop a notion of weak simplicial approximation. As a theoretical example, we give an in-depth study of the normal bundle of the circle, which reduces to understanding the persistent cohomology of the torus knot (1,2).

7.2 Statistical aspects of topological and geometric data analysis

7.2.1 Optimal quantization of the mean measure and applications tostatistical learning

Participants: Frédéric Chazal, Martin Royer.

In collaboration with Clément Levrard (Université Paris-Diderot)

This work 51 addresses the case where data come as point sets, or more generally as discrete measures. Our motivation is twofold: first we intend to approximate with a compactly supported measure the mean of the measure generating process, that coincides with the intensity measure in the point process framework, or with the expected persistence diagram in the framework of persistence-based topological data analysis. To this aim we provide two algorithms that we prove almost minimax optimal. Second we build from the estimator of the mean measure a vectorization map, that sends every measure into a finite-dimensional Euclidean space, and investigate its properties through a clustering-oriented lens. In a nutshell, we show that in a mixture of measure generating process, our technique yields a representation in ${ℝ}^{k}$, for $k\in {ℕ}^{*}$ that guarantees a good clustering of the data points with high probability. Interestingly, our results apply in the framework of persistence-based shape classification via the ATOL procedure. At last, we assess the effectiveness of our approach on simulated and real datasets, encompassing text classification and large-scale graph classification.

7.2.2 DTM-based Filtrations

Participants: Frédéric Chazal, Marc Glisse, Raphael Tinarrage.

In collaboration with H. Anai, H. Inakoshi and Y. Umeda (Fujitsu, Japan)

Despite strong stability properties, the persistent homology of filtrations classically used in Topological Data Analysis, such as, e.g. the Čech or Vietoris-Rips filtrations, are very sensitive to the presence of outliers in the data from which they are computed. In this work 12, we introduce and study a new family of filtrations, the DTM-filtrations, built on top of point clouds in the Euclidean space which are more robust to noise and outliers. The approach adopted in this work relies on the notion of distance-to-measure functions and extends some previous work on the approximation of such functions.

7.2.3 Understanding the Topology and the Geometry of the Space of Persistence Diagrams via Optimal Partial Transport

Participants: Vincent Divol, Théo Lacombe.

Despite the obvious similarities between the metrics used in topological data analysis and those of optimal transport, an optimal-transport based formalism to study persistence diagrams and similar topological descriptors has yet to come. In this work 17, by considering the space of persistence diagrams as a space of discrete measures, and by observing that its metrics can be expressed as optimal partial transport problems, we introduce a generalization of persistence diagrams, namely Radon measures supported on the upper half plane. Such measures naturally appear in topological data analysis when considering continuous representations of persistence diagrams (e.g. persistence surfaces) but also as limits for laws of large numbers on persistence diagrams or as expectations of probability distributions on the persistence diagrams space. We explore topological properties of this new space, which will also hold for the closed subspace of persistence diagrams. New results include a characterization of convergence with respect to Wasserstein metrics, a geometric description of barycenters (Fréchet means) for any distribution of diagrams, and an exhaustive description of continuous linear representations of persistence diagrams. We also showcase the strength of this framework to study random persistence diagrams by providing several statistical results made meaningful thanks to this new formalism.

7.2.4 Minimax adaptive estimation in manifold inference

Participants: Vincent Divol.

In this work 57, we focus on the problem of manifold estimation: given a set of observations sampled close to some unknown submanifold $M$ , one wants to recover information about the geometry of $M$. Minimax estimators which have been proposed so far all depend crucially on the a priori knowledge of some parameters quantifying the regularity of $M$ (such as its reach), whereas those quantities will be unknown in practice. Our contribution to the matter is twofold: first, we introduce a one-parameter family of manifold estimators $\left({M}_{t}\right)$, $t\ge 0$, and show that for some choice of $t$ (depending on the regularity parameters), the corresponding estimator is minimax on the class of models of ${𝒞}^{2}$ manifolds introduced in [Genovese et al., Manifold estimation and singular deconvolution under Hausdorff loss]. Second, we propose a completely data-driven selection procedure for the parameter $t$, leading to a minimax adaptive manifold estimator on this class of models. This selection procedure actually allows to recover the sample rate of the set of observations, and can therefore be used as an hyperparameter in other settings, such as tangent space estimation.

7.2.5 Volume Doubling Condition and a Local Poincaré Inequality on Unweighted Random Geometric Graphs

Participants: Gilles Blanchard.

In collaboration with Franziska Göbel (Institute of Mathematics, University of Potsdam)

The aim of this work 59 is to establish two fundamental measure-metric properties of particular random geometric graphs. We consider $\epsilon$-neighborhood graphs whose vertices are drawn independently and identically distributed from a common distribution defined on a regular submanifold of ${ℝ}^{K}$. We show that a volume doubling condition (VD) and local Poincaré inequality (LPI) hold for the random geometric graph (with high probability, and uniformly over all shortest path distance balls in a certain radius range) under suitable regularity conditions of the underlying submanifold and the sampling distribution.

7.3 Topological and geometric approaches for machine learning

7.3.1 Inverse Problems in Topological Persistence: a Survey

Participants: Steve Oudot.

In collaboration with Elchanan Solomon (Duke University)

In this survey 23, we review the literature on inverse problems in topological persistence theory. The first half of the survey is concerned with the question of surjectivity, i.e. the existence of right inverses, and the second half focuses on injectivity, i.e. left inverses. Throughout, we highlight the tools and theorems that underlie these advances, and direct the reader’s attention to open problems, both theoretical and applied.

7.3.2 Intrinsic Topological Transforms via the Distance Kernel Embedding

Participants: Clément Maria, Steve Oudot.

In collaboration with Elchanan Solomon (Duke University)

Topological transforms are parametrized families of topological invariants, which, by analogy with transforms in signal processing, are much more discriminative than single measurements. The first two topological transforms to be defined were the Persistent Homology Transform and Euler Characteristic Transform, both of which apply to shapes embedded in Euclidean space. The contribution of this work 34 is to define topological transforms that depend only on the intrinsic geometry of a shape, and hence are invariant to the choice of embedding. To that end, given an abstract metric measure space, we define an integral operator whose eigenfunctions are used to compute sublevel set persistent homology. We demonstrate that this operator, which we call the distance kernel operator, enjoys desirable stability properties, and that its spectrum and eigenfunctions concisely encode the large-scale geometry of our metric measure space. We then define a number of topological transforms using the eigenfunctions of this operator, and observe that these transforms inherit many of the stability and injectivity properties of the distance kernel operator.

7.3.3 PLLay: Efficient Topological Layer based on Persistence Landscapes

Participants: Frédéric Chazal, Jisu Kim.

In collaboration with K. Kim, J.S. Kim, L. Wasserman (Carnegie Mellon University) and M. Zaheer (Google Research)

In this work 32, we propose PLLay, a novel topological layer for general deep learning models based on persistence landscapes, in which we can efficiently exploit the underlying topological features of the input data structure. We show differentiability with respect to layer inputs, for a general persistent homology with arbitrary filtration. Thus, our proposed layer can be placed anywhere in the network and feed critical information on the topological features of input data into subsequent layers to improve the learnability of the networks toward a given task. A task optimal structure of PLLay is learned during training via backpropagation, without requiring any input featurization or data preprocessing. We provide a novel adaptation for the DTM function-based filtration, and show that the proposed layer is robust against noise and outliers through a stability analysis. We demonstrate the effectiveness of our approach by classification experiments on various datasets.

7.3.4 Topological Data Analysis for Arrhythmia Detection through Modular Neural Networks

Participants: Frédéric Chazal.

In collaboration with M. Dindin and Y. Umeda (Fujitsu, Japan)

This work 31 presents an innovative and generic deep learning approach to monitor heart conditions from ECG signals.We focus our attention on both the detection and classification of abnormal heartbeats, known as arrhythmia. We strongly insist on generalization throughout the construction of a deeplearning model that turns out to be effective for new unseen patient. The novelty of our approach relieson the use of topological data analysis as basis of our multichannel architecture, to diminish the bias due to individual differences. We show that our structure reaches the performances of the state-of-the-art methods regarding arrhythmia detection and classification.

7.3.5 A note on stochastic subgradient descent for persistence-based functionals: convergence and practical aspects

Participants: Mathieu Carrière, Frédéric Chazal, Marc Glisse, Hari Kannan, Théo Lacombe.

In collaboration with Yiuchi Ike (Fujitsu, Japan)

Solving optimization tasks based on functions and losses with a topological flavor is a very active and growing field of research in Topological Data Analysis, with plenty of applications in non-convex optimization, statistics and machine learning. All of these methods rely on the fact that most of the topological constructions are actually stratifiable and differentiable almost everywhere. However, the corresponding gradient and associated code is always anchored to a specific application and/or topological construction, and do not come with theoretical guarantees. In this work 50, we study the differentiability of a general functional associated with the most common topological construction, that is, the persistence map, and we prove a convergence result of stochastic subgradient descent for such a functional. This result encompasses all the constructions and applications for topological optimization in the literature, and comes with code that is easy to handle and mix with other non-topological constraints, and that can be used to reproduce the experiments described in the literature.

7.3.6 ATOL: Measure Vectorization for Automatic Topologically-Oriented Learning

Participants: Frédéric Chazal, Martin Royer.

In collaboration with Clément Levrard (Université Paris-Diderot), Yiuchi Ike and Yuhei Umeda (Fujitsu, Japan).

Robust topological information commonly comes in the form of a set of persistence diagrams, finite measures that are in nature uneasy to affix to generic machine learning frameworks. In this work 65, we introduce a fast, learnt, unsupervised vectorization method for measures in Euclidean spaces and use it for reflecting underlying changes in topological behaviour in machine learning contexts. The algorithm is simple and efficiently discriminates important space regions where meaningful differences to the mean measure arise. It is proven to be able to separate clusters of persistence diagrams. We showcase the strength and robustness of our approach on a number of applications, from emulous and modern graph collections where the method reaches state-of-the-art performance to a geometric synthetic dynamical orbits problem. The proposed methodology comes with a single high level tuning parameter: the total measure encoding budget.

7.3.7 Multiparameter Persistence Image for Topological Machine Learning

Participants: Mathieu Carrière.

In collaboration with Andrew Blumberg (Université de Columbia, New-York, USA).

In the last decade, there has been increasing interest in topological data analysis, a new methodology for using geometric structures in data for inference and learning. A central theme in the area is the idea of persistence, which in its most basic form studies how measures of shape change as a scale parameter varies. There are now a number of frameworks that support statistics and machine learning in this context. However, in many applications there are several different parameters one might wish to vary: for example, scale and density. In contrast to the one-parameter setting, techniques for applying statistics and machine learning in the setting of multiparameter persistence are not well understood due to the lack of a concise representation of the results. We introduce a new descriptor for multiparameter persistence, which we call the Multiparameter Persistence Image, that is suitable for machine learning and statistical frameworks, is robust to perturbations in the data, has finer resolution than existing descriptors based on slicing, and can be efficiently computed on data sets of realistic size. Moreover, we demonstrate its efficacy by comparing its performance to other multiparameter descriptors on several classification tasks.

7.4 Miscellaneous

7.4.1 Quantitative stability of optimal transport maps and linearization of the 2-Wasserstein space

Participants: Frédéric Chazal, Alex Delalande.

In collaboration with Quentin Mérigot (Laboratoire de Mathématiques d'Orsay, Univ. Paris-Saclay)

This work 35 studies an explicit embedding of the set of probability measures into a Hilbert space, defined using optimal transport maps from a reference probability density. This embedding linearizes to some extent the 2-Wasserstein space, and enables the direct use of generic supervised and unsupervised learning algorithms on measure data. Our main result is that the embedding is (bi-)Hö lder continuous, when the reference density is uniform over a convex set, and can be equivalently phrased as a dimension-independent Hölder-stability results for optimal transport maps.

7.4.2 Post hoc confidence bounds on false positives using reference families

Participants: Gilles Blanchard.

In collaboration with Étienne Roquain (LPSM, Sorbonne université), Pierre Neuvial (IMT, Toulouse Université)

In this work 14, we follow a post-hoc, "user-agnostic" approach to false discovery control in a large-scale multiple testing framework, as introduced by Genovese and Wasserman (2006), Goeman and Solari (2011): the statistical guarantee on the number of correct rejections must hold for any set of candidate items, possibly selected by the user after having seen the data. To this end, we introduce a novel point of view based on a family of reference rejection sets and a suitable criterion, namely the joint-family-wise-error rate over that family (JER for short). First, we establish how to derive post hoc bounds from a given JER control and analyze some general properties of this approach. We then develop procedures for controlling the JER in the case where reference regions are $p$-value level sets. These procedures adapt to dependencies and to the unknown quantity of signal (via a step-down principle). We also show interesting connections to confidence envelopes of Meinshausen (2006); Genovese and Wasserman (2006), the closed testing based approach of Goeman and Solari (2011) and to the higher criticism of Donoho and Jin (2004). Our theoretical statements are supported by numerical experiments.

Published in Annals of Statistics, 2020.

7.4.3 Compressive Statistical Learning with Random Feature Moments

Participants: Gilles Blanchard.

In collaboration with Rémi Gribonval (INRIA Lyon), Nicolas Keriven (CNRS, GIPSA, Université Rhône-Alpes), Yan Traonmilin (CNRS, IMB, Université Bordeaux)

We introduce in this work 20 a general framework –compressive statistical learning– for resource-efficient large-scale learning: the training collection is compressed in one pass into a low-dimensional sketch (a vector of random empirical generalized moments) that captures the information relevant to the considered learning task. A near-minimizer of the risk is computed from the sketch through the solution of a nonlinear least squares problem. We investigate sufficient sketch sizes to control the generalization error of this procedure. The framework is illustrated on compressive PCA, compressive clustering, and compressive Gaussian mixture Modeling with fixed known variance. The latter two are further developed in a companion paper.

Accepted for publication in Mathematical Statistics and Learning, 2021.

7.4.4 Domain Generalization by Marginal Transfer Learning

Participants: Gilles Blanchard.

In collaboration with Aniket Anand Deshmukh (Microsoft Research), Urun Dogan (Microsoft Research), Gyemin Lee (Seoul University for Science and Technology), Clayton Scott (University of Michigan)

In the problem of domain generalization (DG), there are labeled training data sets from several related prediction problems, and the goal is to make accurate predictions on future unlabeled data sets that are not known to the learner. This problem arises in several applications where data distributions fluctuate because of environmental, technical, or other sources of variation. In the work 42 we introduce a formal framework for DG, and argue that it can be viewed as a kind of supervised learning problem by augmenting the original feature space with the marginal distribution of feature vectors. While our framework has several connections to conventional analysis of supervised learning algorithms, several unique aspects of DG require new methods of analysis. This work lays the learning theoretic foundations of domain generalization, building on our earlier work where the problem of DG was introduced. We present two formal models of data generation, corresponding notions of risk, and distribution-free generalization error analysis. By focusing our attention on kernel methods, we also provide more quantitative results and a universally consistent algorithm. An efficient implementation is provided for this algorithm, which is experimentally compared to a pooling strategy on one synthetic and three real-world data sets.

Published in Journal of Machine Learning Research, 2021.

7.4.5 A polynomial time algorithm to compute quantum invariants of 3-manifolds with bounded first Betti number

Participants: Clément Maria.

In collaboration with Jonathan Spreer (The University of Sydney, Australia)

In this article, we introduce a fixed parameter tractable algorithm for computing the Turaev-Viro invariants TV4,q, using the dimension of the first homology group of the manifold as parameter. This is, to our knowledge, the first parameterised algorithm in computational 3-manifold topology using a topological parameter. The computation of TV4,q is known to be sharp-P-hard in general; using a topological parameter provides an algorithm polynomial in the size of the input triangulation for the extremely large family of 3-manifolds with first homology group of bounded rank. Our algorithm is easy to implement and running times are comparable with running times to compute integral homology groups for standard libraries of triangulated 3- manifolds. The invariants we can compute this way are powerful: in combination with integral homology and using standard data sets we are able to roughly double the pairs of 3-manifolds we can distinguish. We hope this qualifies TV4,q to be added to the short list of standard properties (such as orientability, connectedness, Betti numbers, etc.) that can be computed ad-hoc when first investigating an unknown triangulation.

Published in the journal on Foundations of Computational Mathematics (FoCM) 2020.

7.4.6 Variable-width contouring for additive manufacturing

Participants: Marc Glisse.

In collaboration with Samuel Hornus, Sylvain Lefebvre, Jonàs Martínez (Inria team MFX), Olivier Devillers, Sylvain Lazard, Monique Teillaud (Inria team Gamble) and Tim Kuipers (Delft University of Technology, Pays-Bas).

In most layered additive manufacturing processes, a tool solidifies or deposits material while following pre-planned trajectories to form solid beads. Many interesting problems arise in this context, among which one concerns the planning of trajectories for filling a planar shape as densely as possible. This is the problem we tackle in the present work 21. Recent works have shown that allowing the bead width to vary along the trajectories helps increase the filling density. We present a novel technique that, given a deposition width range, constructs a set of closed beads whose width varies within the prescribed range and fill the input shape. The technique outperforms the state of the art in important metrics: filling density (while still guaranteeing the absence of bead overlap) and trajectories smoothness. We give a detailed geometric description of our algorithm, explore its behavior on example inputs and provide a statistical comparison with the state of the art. We show that it is possible to obtain high quality fabricated layers on commodity FDM printers.

7.4.7 Mean curvature motion of point cloud varifolds

Participants: Blanche Buet.

In collaboration with Martin Rumpf (University of Bonn)

This paper 49 investigates a discretization scheme for mean curvature motion on point cloud varifolds with particular emphasis on singular evolutions. To define the varifold a local covariance analysis is applied to compute an approximate tangent plane for the points in the cloud. The core ingredient of the mean curvature motion model is the regularization of the first variation of the varifold via convolution with kernels with small stencil. Consistency with the evolution velocity for a smooth surface is proven if a sufficiently small stencil and a regular sampling are taking into account. Furthermore, an implicit and a semiimplicit time discretization are derived. The implicit scheme comes with discrete barrier properties known for the smooth, continuous evolution, whereas the semiimplicit still ensures in all our numerical experiments very good approximation properties while being easy to implement. It is shown that the proposed method is robust with respect to noise and recovers the evolution of smooth curves as well as the formation of singularities such as triple points in 2D or minimal cones in 3D.

7.4.8 Covering families of triangles

Participants: Marc Glisse.

In collaboration with Olivier Devillers, Ji-Won Park (Inria team Gamble) and Otfried Cheong (KAIST, Corée du sud).

A cover for a family F of sets in the plane is a set into which every set in F can be isometrically moved. We are interested in the convex cover of smallest area for a given family of triangles. Park and Cheong conjectured that any family of triangles of bounded diameter has a smallest convex cover that is itself a triangle. The conjecture is equivalent to the claim that for every convex set X there is a triangle Z whose area is not larger than the area of X, such that Z covers the family of triangles contained in X. In this work 52, we prove this claim for the case where a diameter of X lies on its boundary. We also give a complete characterization of the smallest convex cover for the family of triangles contained in a half-disk, and for the family of triangles contained in a square. In both cases, this cover is a triangle.

8 Bilateral contracts and grants with industry

8.1 Bilateral contracts with industry

• Collaboration with Sysnav, a French SME with world leading expertise in navigation and geopositioning in extreme environments, on TDA, geometric approaches and machine learning for the analysis of movements of pedestrians and patients equipped with inetial sensors (CIFRE PhD of Bertrand Beaufils).
• Research collaboration with Fujitsu on the development of new TDA methods and tools for Machine learning and Artificial Intelligence (started in Dec 2017).
• Research collaboration with MetaFora on the development of new TDA-based and statistical methods for the analysis of cytometric data (started in Nov. 2019).

8.2 Bilateral grants with industry

• DataShape and Sysnav have been selected for the ANR/DGA Challenge MALIN (funding: 700 kEuros) on pedestrian motion reconstruction in severe environments (without GPS access).

9 Partnerships and cooperations

9.1 International initiatives

9.1.1 Inria international partners

Informal international partners

• TopStat group (L. Wasserman and A. Rinaldo) at Carnegie Mellon: DataShape maintains a long-standing collaboration with this group since several years with several joint publications.

9.2 National initiatives

9.2.1 ANR

ANR ASPAG

Participants: Marc Glisse.

- Acronym : ASPAG.

- Type : ANR blanc.

- Title : Analysis and Probabilistic Simulations of Geometric Algorithms.

- Coordinator : Olivier Devillers (équipe Inria Gamble).

- Duration : 4 years from January 2018 to December 2021.

- Others Partners: Inria Gamble, LPSM, LABRI, Université de Rouen, IECL, Université du Littoral Côte d'Opale, Telecom ParisTech, Université Paris X (Modal'X), LAMA, Université de Poitiers, Université de Bourgogne.

- Abstract:

The analysis and processing of geometric data has become routine in a variety of human activities ranging from computer-aided design in manufacturing to the tracking of animal trajectories in ecology or geographic information systems in GPS navigation devices. Geometric algorithms and probabilistic geometric models are crucial to the treatment of all this geometric data, yet the current available knowledge is in various ways much too limited: many models are far from matching real data, and the analyses are not always relevant in practical contexts. One of the reasons for this state of affairs is that the breadth of expertise required is spread among different scientific communities (computational geometry, analysis of algorithms and stochastic geometry) that historically had very little interaction. The Aspag project brings together experts of these communities to address the problem of geometric data. We will more specifically work on the following three interdependent directions.

(1) Dependent point sets: One of the main issues of most models is the core assumption that the data points are independent and follow the same underlying distribution. Although this may be relevant in some contexts, the independence assumption is too strong for many applications.

(2) Simulation of geometric structures: The phenomena studied in (1) involve intricate random geometric structures subject to new models or constraints. A natural first step would be to build up our understanding and identify plausible conjectures through simulation. Perhaps surprisingly, the tools for an effective simulation of such complex geometric systems still need to be developed.

(3) Understanding geometric algorithms: the analysis of algorithm is an essential step in assessing the strengths and weaknesses of algorithmic principles, and is crucial to guide the choices made when designing a complex data processing pipeline. Any analysis must strike a balance between realism and tractability; the current analyses of many geometric algorithms are notoriously unrealistic. Aside from the purely scientific objectives, one of the main goals of Aspag is to bring the communities closer in the long term. As a consequence, the funding of the project is crucial to ensure that the members of the consortium will be able to interact on a very regular basis, a necessary condition for significant progress on the above challenges.

ANR Chair in AI

Participants: Frédéric Chazal, Marc Glisse, Louis Pujol, Wojciech Riese.

- Acronym : TopAI

- Type : ANR Chair in AI.

- Title : Topological Data Analysis for Machine Learning and AI

- Coordinator : Frédéric Chazal

- Duration : 4 years from September 2020 to August 2024.

- Others Partners: Two industrial partners, the French SME Sysnav and the French start-up MetaFora.

- Abstract:

The TopAI project aims at developing a world-leading research activity on topological and geometric approaches in Machine Learning (ML) and AI with a double academic and industrial/societal objective. First, building on the strong expertise of the candidate and his team in TDA, TopAI aims at designing new mathematically well-founded topological and geometric methods and tools for Data Analysis and ML and to make them available to the data science and AI community through state-of-the-art software tools. Second, thanks to already established close collaborations and the strong involvement of French industrial partners, TopAI aims at exploiting its expertise and tools to address a set of challenging problems with high societal and economic impact in personalized medicine and AI-assisted medical diagnosis.

ANR ALGOKNOT

Participants: Clément Maria.

- Acronym : ALGOKNOT.

- Type : ANR Jeune Chercheuse Jeune Chercheur.

- Title : Algorithmic and Combinatorial Aspects of Knot Theory.

- Coordinator : Clément Maria.

- Duration : 2020 – 2023 (3 years).

- Abstract: The project AlgoKnot aims at strengthening our understanding of the computational and combinatorial complexity of the diverse facets of knot theory, as well as designing efficient algorithms and software to study their interconnections.

9.2.2 Collaboration with other national research institutes

SHOM

Participants: Steve Oudot.

Research collaboration between DataShape and the Service Hydrographique et Océanographique de la Marine (SHOM) on bathymetric data analysis using a combination of TDA and deep learning techniques. This collaboration is funded by the AMI IA Améliorer la cartographie du littoral.

IFPEN

Participants: Frédéric Chazal, Marc Glisse, Jisu Kim.

Research collaboration between DataShape and IFPEN on TDA applied to various problems issued from energy transition and sustainable mobility.

9.3 Regional initiatives

PhD² CytoPart

Participants: Marc Glisse, Louis Pujol.

- Acronym : CytoPart.

- Type : Paris Region PhD².

- Title : Partitionnement de données cytométriques.

The Île-de-France region funds one PhD thesis supervised by Pascal Massart (Inria team Celeste) and Marc Glisse, in collaboration with Metafora biosystems, a company specialized in the analysis of cells through their metabolism. The goal of the project is to improve clustering for this particular type of data.

10 Dissemination

10.1 Promoting scientific activities

10.1.2 Scientific events: selection

Member of the conference program committees

• Marc Glisse was a member of the Program Committee of the International Symposium on Computational Geometry (SoCG), June 2020.
• Gilles Blanchard was an Area Chair for the NeurIPS 2020 conference.

10.1.3 Journal

Member of the editorial boards

• Jean-Daniel Boissonnat is a member of the Editorial Board of the Journal of the ACM.
• Jean-Daniel Boissonnat is a member of the Editorial Board of Discrete and Computational Geometry (Springer).
• Frédéric Chazal is a member of the Editorial Board of Discrete and Computational Geometry (Springer).
• Frédéric Chazal is a member of the Editorial Board of Graphical Models (Elsevier).
• Frédéric Chazal is a member of the Scientific Board of Journal of Applied and Computational Topology (Springer), and Editor-in-Chief since January 1st 2021.
• Gilles Blanchard is a member of the Editorial Boards of Bernoulli, Electronic Journal of Statistics, and Annales de l'Institut Henri Poincaré Probability and Statistics.
• Steve Oudot is a member of the Editorial Board of the Journal of Computational Geometry.

10.1.4 Invited talks

• Steve Oudot. Two Decomposition Results for Bipersistence Modules. MFO Workshop on Representation Theory of Quivers and Finite Dimensional Algebras, Oberwolfach, Germany, January 2020.
• Frédéric Chazal. Approches topologiques et géométriques pour l'apprentissage statistique, théorie et pratique, EDF and System X workshop, September 2020.
• Frédéric Chazal. Learning linear representations of persistence diagrams: mathematical aspects and applications. Applied Machine Learning Days at EPFL 2020, January 2020.
• Jean-Daniel Boissonnat. Delaunay triangulation of manifolds. Inaugural conference at the web-seminar series on Applications of Geometry and Topology (GEOTOP-A), January 2020.
• Blanche Buet. Weak and approximate curvatures of a measure: a varifold perspective. Mathematics and Image Analysis MIA'21, January 2021.

10.1.5 Leadership within the scientific community

• Frédéric Chazal is co-responsible, with E. Scornett (Ecole Polytechnique), of the “programme Maths-IA” of the Fondation Mathématique Jacques Hadamard (FMJH).
• Frédéric Chazal is a member of the “Comité de pilotage” of the SIGMA group at SMAI.
• Steve Oudot is co-responsible, with L. Castelli-Aleardi, of the GT GeoAlgo within the GdR-IM.

• Marc Glisse is president of the CDT at Inria Saclay.
• Steve Oudot is president of the Commission Scientifique at Inria Saclay.
• Frédéric Chazal is a member of the Graduate School in Mathematics at Université Paris-Saclay.
• Clément Maria is a member of the CDT at Inria Sophia Antipolis-Méditerranée.
• Blanche Buet is member of Committee on Gender Equality of LMO at Université Paris-Saclay and member of the Laboratory Council of LMO at Université Paris-Saclay. She has also been member of a recruitement committee recruitement committees for a “Maître de conférence” position at IMJ-PRG, Sorbonne Université and a “PRAG” position at LMO, Université Paris-Saclay, both in 2020.

10.2 Teaching - Supervision - Juries

10.2.1 Teaching

• Master: Frédéric Chazal and Quentin Mérigot, Analyse Topologique des Données, 30h eq-TD, Université Paris-Sud, France.
• Master: Marc Glisse and Clément Maria, Computational Geometry Learning, 36h eq-TD, M2, MPRI, France.
• Master: Frédéric Cazals and Frédéric Chazal, Geometric Methods for Data Analysis, 30h eq-TD, M1, École Centrale Paris, France.
• Master: Frédéric Chazal and Julien Tierny, Topological Data Analysis, 38h eq-TD, M2, Mathématiques, Vision, Apprentissage (MVA), ENS Paris-Saclay, France.
• Master: Steve Oudot, Topological data analysis, 45h eq-TD, M1, École polytechnique, France.
• Master: Steve Oudot, Data Analysis: geometry and topology in arbitrary dimensions, 24h eq-TD, M2, graduate program in Artificial Intelligence & Advanced Visual Computing, École polytechnique, France.
• Master: Gilles Blanchard, Mathematics for Artificial Intelligence 1, 70h eq-TD, IMO, Université Paris-Saclay, France.
• Master: Blanche Buet, TD-Techniques d'Analyse Harmonique , 30h eq-TD, M2 AAG Orsay, Université Paris-Saclay, France.
• Master: Blanche Buet, TD-Distributions et analyse de Fourier, 60h eq-TD, M1, Université Paris-Saclay, France.
• Undergrad-Master: Steve Oudot, Algorithms for data analysis in C++, 22.5h eq-TD, L3/M1, École polytechnique, France.
• Undergrad: Marc Glisse, Mécanismes de la programmation orientée-objet, 40h eq-TD, L3, École Polytechnique, France.

10.2.2 Supervision

• PhD: Siddharth Pritam, Collapses and persistent homology, Jean-Daniel Boissonnat (Université Côte d'Azur). Defended in April 2020.
• PhD: Nicolas Berkouk, Persistence and Sheaves : from Theory to Applications, Institut Polytechnique de Paris. Defended in September 2020. Steve Oudot.
• PhD: Théo Lacombe, Statistics for topological descriptors using optimal transport, Institut Polytechnique de Paris. Defended in September 2020. Steve Oudot.
• PhD: Raphaël Tinarrage, Topological inference from measures and vector bundles. Defended in October 2020. Frédéric Chazal and Marc Glisse.
• PhD: Bertrand Beaufils, Méthodes topologiques et apprentissage statistique pour l’actimétrie du piéton à partir de données de mouvement, Frédéric Chazal and Bertrand Michel (Ecole Centrale de Nantes).
• PhD in progress: Vadim Lebovici, Laplace transform for constructible functions. Started September 1st, 2020. Steve Oudot and François Petit (CRESS).
• PhD in progress: Christophe Vuong, Random hypergraphs. Started November 2020. Laurent Decreusefond and Marc Glisse.
• PhD in progress: Louis Pujol, Partitionnement de données cytométriques, started Novermber 1st, 2019, Pascal Massart and Marc Glisse.
• PhD in progress: Vincent Divol, statistical aspects of TDA, started September 1st, 2017, Frédéric Chazal and Pascal Massart (LMO).
• PhD in progress: Etienne Lasalle, TDA for graph data, started September 1st, 2019, Frédéric Chazal and Pascal Massart (LMO).
• PhD in progress: Alex Delalande, Measure embedding with Optimal Transport and applications in Machine Learning, started December 1st, 2019, Frédéric Chazal and Quentin Mérigot (LMO).
• PhD in progress: Wojciech Riese, Geometric inference for curves and trajectories. Applications to speed estimation from magnetic field measurements, started in September 2020, Frédéric Chazal and Bertrand Michel (Ecole Centrale de Nantes).
• PhD in progress: Jérémie Capitao-Miniconi, Deconvolution for geometric inference, started October 2020, Frédéric Chazal and Elisabeth Gassiat (LMO).
• PhD in progress: Owen Rouillé, Algorithms and Complexity in Geometric Topology, started September 2018. Clément Maria and Jean-Daniel Boissonnat.
• PhD in progress: Oleksandr Zadorozhnyi, Contributions to the theoretical analysis of the algorithms with adversarial and dependent data, started September 2017. Gilles Blanchard and Alexandra Carpentier.
• PhD in progress: El Mehdi Saad, Efficient online methods for variable and model selection, started September 2019. Gilles Blanchard and Sylvain Arlot.
• PhD in progress: Olympio Hacquard, Dimension reduction for persistent homology, started September 2020. Gilles Blanchard and Clément Levrard.
• PhD in progress: Hannah Marienwald, Transfer learning in high dimension. Started September 2019. Gilles Blanchard and Klaus-Robert Müller.

10.2.3 Juries

• Clément Maria was a member of the jury attributing the Gilles Kahn PhD award, from the SIF and the Academy of Science, Nov. 2020.
• Steve Oudot was reviewer for the Ph.D. defence of H$\stackrel{˚}{\mathrm{a}}$vard Bjerkevik, Norwegian University of Science and Technology, June 2020.
• Steve Oudot was a member of the jury for CRCN applications at Inria Nancy – Grand Est, Spring 2020.
• Blanche Buet was a member of the PhD defence of Camille Labourie, Université Paris Saclay, January 2020 ; François Genereau, Université Grenoble Alpes, June 2020 and Raphaël Tinarrage, October 2020, INRIA-Université Paris Saclay.

10.3 Popularization

10.3.1 Interventions

• Frédéric Chazal. Les données ont elles une forme? Une petite introduction à l'Analyse Topologique des Données. Back-to-school seminar of the Master in Mathematics at Université PAris-Saclay.

11 Scientific production

11.1 Major publications

• 1 article DominiqueD. Attali, UlrichU. Bauer, OlivierO. Devillers, MarcM. Glisse and AndréA. Lieutier. 'Homological Reconstruction and Simplification in R3'. Computational Geometry 2014
• 2 articleJean-DanielJ.-D. Boissonnat, RamsayR. Dyer and ArijitA. Ghosh. 'Delaunay Triangulation of Manifolds'.Foundations of Computational Mathematics452017, 38
• 3 articleJean-DanielJ.-D. Boissonnat, RamsayR. Dyer, ArijitA. Ghosh and Steve Y.S. Oudot. 'Only distances are required to reconstruct submanifolds'.Computational Geometry662017, 32 - 67
• 4 article Jean-DanielJ.-D. Boissonnat, Karthik C.K. Srikanta and SébastienS. Tavenas. 'Building Efficient and Compact Data Structures for Simplicial Complexe'. Algorithmica September 2016
• 5 articleBlancheB. Buet, Gian PaoloG. Leonardi and SimonS. Masnou. 'A Varifold Approach to Surface Approximation'.Archive for Rational Mechanics and Analysis2262November 2017, 639-694
• 6 articleFrédéricF. Chazal, DavidD. Cohen-Steiner and AndréA. Lieutier. 'A Sampling Theory for Compact Sets in Euclidean Space'.Discrete Comput. Geom.4132009, 461--479
• 7 articleFrédéricF. Chazal, DavidD. Cohen-Steiner and QuentinQ. Mérigot. 'Geometric Inference for Measures based on Distance Functions'.Foundations of Computational Mathematics116RR-69302011, 733-751
• 8 bookFrédéricF. Chazal, Steve Y.S. Oudot, MarcM. Glisse and VinV. De Silva. 'The Structure and Stability of Persistence Modules'.SpringerBriefs in MathematicsSpringer Verlag2016, VII, 116
• 9 articleLeonidas J.L. Guibas, Steve Y.S. Oudot, PrimozP. Skraba and FrédéricF. Chazal. 'Persistence-Based Clustering in Riemannian Manifolds'.Journal of the ACM606November 2013, 38
• 10 articleManishM. Mandad, DavidD. Cohen-Steiner, LeifL. Kobbelt, PierreP. Alliez and MathieuM. Desbrun. 'Variance-Minimizing Transport Plans for Inter-surface Mapping'.ACM Transactions on Graphics362017, 14
• 11 bookSteve Y.S. Oudot. 'Persistence Theory: From Quiver Representations to Data Analysis'.Mathematical Surveys and Monographs209American Mathematical Society2015, 218

11.2 Publications of the year

International journals

• 12 articleHirokazuH. Anai, FrédéricF. Chazal, MarcM. Glisse, YuichiY. Ike, HiroyaH. Inakoshi, RaphaëlR. Tinarrage and YuheiY. Umeda. 'DTM-based Filtrations'.Abel Symposia152020, 33-66
• 13 articleGillesG. Blanchard and NicoleN. Mücke. 'Kernel regression, minimax rates and effective dimensionality: Beyond the regular case'.Analysis and Applications1804July 2020, 683-696
• 14 articleGillesG. Blanchard, PierreP. Neuvial and EtienneE. Roquain. 'Post hoc confidence bounds on false positives using reference families'.Annals of Statistics483June 2020, 1281-1303
• 15 articleJean-DanielJ.-D. Boissonnat, OlivierO. Devillers, KunalK. Dutta and MarcM. Glisse. 'Randomized incremental construction of Delaunay triangulations of nice point sets'.Discrete and Computational Geometry642020, 33
• 16 article Jean-DanielJ.-D. Boissonnat, SiargeyS. Kachanovich and MathijsM. Wintraecken. 'Triangulating submanifolds: An elementary and quantified version of Whitney’s method'. Discrete and Computational Geometry December 2020
• 17 article VincentV. Divol and ThéoT. Lacombe. 'Understanding the Topology and the Geometry of the Space of Persistence Diagrams via Optimal Partial Transport'. Journal of Applied and Computational Topology October 2020
• 18 articleGuillermoG. Durand, GillesG. Blanchard, PierreP. Neuvial and EtienneE. Roquain. 'Post hoc false positive control for structured hypotheses'.Scandinavian Journal of Statistics474December 2020, 1114-1148
• 19 article AurélieA. Fischer, ClémentC. Levrard and ClaireC. Brécheteau. 'Robust Bregman Clustering'. Annals of Statistics 2020
• 20 article RémiR. Gribonval, GillesG. Blanchard, NicolasN. Keriven and YannY. Traonmilin. 'Compressive Statistical Learning with Random Feature Moments'. Mathematical Statistics and Learning 2021
• 21 article SamuelS. Hornus, TimT. Kuipers, OlivierO. Devillers, MoniqueM. Teillaud, JonàsJ. Martínez, MarcM. Glisse, SylvainS. Lazard and SylvainS. Lefebvre. 'Variable-width contouring for additive manufacturing'. ACM Transactions on Graphics 39 4 (Proc. SIGGRAPH) July 2020
• 22 article CochoyC. Jérémy and Steve Y.S. Oudot. 'Decomposition of exact pfd persistence bimodules'. Discrete and Computational Geometry 2020
• 23 article SteveS. Oudot and ElchananE. Solomon. 'Inverse Problems in Topological Persistence: a Survey'. Abel Symposia 2020
• 24 articleAbhishakeA. Rastogi, GillesG. Blanchard and PeterP. Mathé. 'Convergence analysis of Tikhonov regularization for non-linear statistical inverse problems'.Electronic journal of statistics1422020, 2798-2841

International peer-reviewed conferences

• 25 inproceedings ShreyaS. Arya, Jean-DanielJ.-D. Boissonnat, KunalK. Dutta and MartinM. Lotz. 'Dimensionality Reduction for k-Distance Applied to Persistent Homology'. SoCG 2020 - 36th International Symposium on Computational Geometry Zurich, Switzerland June 2020
• 26 inproceedings Jean-DanielJ.-D. Boissonnat and SiddharthS. Pritam. 'Edge Collapse and Persistence of Flag Complexes'. SoCG 2020 - 36th International Symposium on Computational Geometry Zurich, Switzerland June 2020
• 27 inproceedings Jean-DanielJ.-D. Boissonnat and MathijsM. Wintraecken. 'The Topological Correctness of PL-Approximations of Isomanifolds'. SoCG 2020 - 36th International Symposium on Computational Geometry Zurich, Switzerland June 2020
• 28 inproceedingsMagnus BakkeM. Botnan, VadimV. Lebovici and SteveS. Oudot. 'On rectangle-decomposable 2-parameter persistence modules'.SoCG 2020 - 36th International Symposium on Computational Geometry16436th International Symposium on Computational Geometry (SoCG 2020)Zurich, SwitzerlandJune 2020, 22:1-22:16
• 29 inproceedings MathieuM. Carriere and Andrew JA. Blumberg. 'Multiparameter Persistence Images for Topological Machine Learning'. NeurIPS 2020 - 34th Conference on Neural Information Processing Systems Vancouver / Virtuel, Canada December 2020
• 30 inproceedings DavidD. Cohen-Steiner, AndréA. Lieutier and JulienJ. Vuillamy. 'Lexicographic optimal homologous chains and applications to point cloud triangulations'. SoCG 2020 - 36th International Symposium on Computational Geometry 36th International Symposium on Computational Geometry (SoCG 2020) Zurich, Switzerland June 2020
• 31 inproceedings MeryllM. Dindin, YuheiY. Umeda and FrédéricF. Chazal. 'Topological Data Analysis for Arrhythmia Detection through Modular Neural Networks'. 33rd Canadian Conference on Artificial Intelligence, May 2020. CanadianAI 2020 - 33rd Canadian Conference on Artificial Intelligence Proc. 33rd Canadian Conference on Artificial Intelligence, May 2020. Ottawa, Canada May 2020
• 32 inproceedings KwanghoK. Kim, JisuJ. Kim, ManzilM. Zaheer, Joon SikJ. Kim, FrédéricF. Chazal and LarryL. Wasserman. 'PLLay: Efficient Topological Layer based on Persistence Landscapes'. NeurIPS 2020 - 34th Conference on Neural Information Processing Systems Vancouver / Virtuel, Canada December 2020
• 33 inproceedings JisuJ. Kim, JaehyeokJ. Shin, FrédéricF. Chazal, AlessandroA. Rinaldo and LarryL. Wasserman. 'Homotopy Reconstruction via the Cech Complex and the Vietoris-Rips Complex'. 36th International Symposium on Computational Geometry (SoCG 2020) SoCG 2020 - 36th International Symposium on Computational Geometry 164 LIPIcs, Volume 164, SoCG 2020, Complete Volume Zurich, Switzerland June 2020
• 34 inproceedings ClémentC. Maria, Steve Y.S. Oudot and ElchananE. Solomon. 'Intrinsic Topological Transforms via the Distance Kernel Embedding'. SoCG 2020 - 36th International Symposium on Computational Geometry Zurich, Switzerland 2020
• 35 inproceedingsQuentinQ. Merigot, AlexA. Delalande and FrédéricF. Chazal. 'Quantitative stability of optimal transport maps and linearization of the 2-Wasserstein space'.AISTATS 2020 - 23rd International Conference on Artificial Intelligence and StatisticsProceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics108Palermo /Online, ItalyAugust 2020, 3186-3196

Conferences without proceedings

• 36 inproceedings Jean-DanielJ.-D. Boissonnat, SiargeyS. Kachanovich and MathijsM. Wintraecken. 'Tracing isomanifolds in ${R}^{d}$ in time polymonial in d using Coxeter-Freudenthal-Kuhn triangulations'. SoCG 2021 - 37th Symposium on Computational Geometry Buffalo, United States https://cse.buffalo.edu/socg21/socg.html June 2021
• 37 inproceedings ClémentC. Maria and OwenO. Rouillé. 'Computation of Large Asymptotics of 3-Manifold Quantum Invariants'. ALENEX21 Alexandria, United States January 2021

Doctoral dissertations and habilitation theses

• 38 thesis NicolasN. Berkouk. 'Persistence and Sheaves : from Theory to Applications'. Institut Polytechnique de Paris September 2020
• 39 thesis ThéoT. Lacombe. 'Statistics for Topological Descriptors using optimal transport'. Institut Polytechnique de Paris September 2020
• 40 thesis SiddharthS. Pritam. 'Collapses and persistent homology'. Université Côte d'Azur June 2020
• 41 thesis RaphaëlR. Tinarrage. 'Topological inference from measures and vector bundles'. Université Paris-Saclay October 2020

Reports & preprints

• 42 misc GillesG. Blanchard, Aniket AnandA. Deshmukh, UrunU. Dogan, GyeminG. Lee and ClaytonC. Scott. 'Domain Generalization by Marginal Transfer Learning'. October 2020
• 43 misc GillesG. Blanchard, PeterP. Mathé and NicoleN. Mücke. 'Lepskii Principle in Supervised Learning'. October 2020
• 44 misc GillesG. Blanchard, PierreP. Neuvial and EtienneE. Roquain. 'On agnostic post hoc approaches to false positive control'. October 2020
• 45 misc Jean-DanielJ.-D. Boissonnat, SiargeyS. Kachanovich and MathijsM. Wintraecken. 'A compact data structure for high dimensional Coxeter-Freudenthal-Kuhn triangulations'. November 2020
• 46 misc Jean-DanielJ.-D. Boissonnat, SiargeyS. Kachanovich and MathijsM. Wintraecken. 'Tracing Isomanifolds of Fixed Dimension in Polynomial Time'. July 2020
• 47 misc Jean-DanielJ.-D. Boissonnat and MathijsM. Wintraecken. 'The topological correctness of PL-approximations of isomanifolds'. October 2020
• 48 misc Magnus BakkeM. Botnan, VadimV. Lebovici and SteveS. Oudot. 'Local characterizations for decomposability of 2-parameter persistence modules'. November 2020
• 49 misc BlancheB. Buet and MartinM. Rumpf. 'Mean curvature motion of point cloud varifolds'. November 2020
• 50 misc MathieuM. Carriere, FrédéricF. Chazal, MarcM. Glisse, YuichiY. Ike and HariprasadH. Kannan. 'Optimizing persistent homology based functions'. February 2021
• 51 misc FrédéricF. Chazal, ClémentC. Levrard and MartinM. Royer. 'Optimal quantization of the mean measure and applications to statistical learning'. March 2021
• 52 reportOtfriedO. Cheong, OlivierO. Devillers, MarcM. Glisse and Ji-wonJ.-w. Park. 'Covering families of triangles'.INRIA2020, 31
• 53 misc DavidD. Cohen-Steiner and Alba ChiaraA. de Vitis. 'Spectral Properties of Radial Kernels and Clustering in High Dimensions'. January 2020
• 54 misc AlexA. Delalande and QuentinQ. Merigot. 'Quantitative Stability of Optimal Transport Maps under Variations of the Target Measure'. March 2021
• 55 misc OlivierO. Devillers, PhilippeP. Duchon, MarcM. Glisse and XavierX. Goaoc. 'On Order Types of Random Point Sets'. May 2020
• 56 misc VincentV. Divol. 'A short proof on the rate of convergence of the empirical measure for the Wasserstein distance'. January 2021
• 57 misc VincentV. Divol. 'Minimax adaptive estimation in manifold inference'. June 2020
• 58 misc VincentV. Divol. 'Reconstructing measures on manifolds: an optimal transport approach'. February 2021
• 59 misc FranziskaF. Göbel and GillesG. Blanchard. 'Volume Doubling Condition and a Local Poincaré Inequality on Unweighted Random Geometric Graphs'. December 2020
• 60 misc ConsortiumC. ICUBAM, LaurentL. Bonnasse-Gahot, MaximeM. Dénès, GabrielG. Dulac-Arnold, SertanS. Girgin, FrançoisF. Husson, ValentinV. Iovene, JulieJ. Josse, AntoineA. Kimmoun, FrançoisF. Landes, Jean-PierreJ.-P. Nadal, RomainR. Primet, FredericoF. Quintao, Pierre GuillaumeP. Raverdy, VincentV. Rouvreau, OlivierO. Teboul and RomanR. Yurchak. 'ICU Bed Availability Monitoring and analysis in the Grand Est region of France during the COVID-19 epidemic'. May 2020
• 61 misc ClémentC. Maria. 'Parameterized complexity of quantum knot invariants'. January 2020
• 62 misc HannahH. Marienwald, Jean-BaptisteJ.-B. Fermanian and GillesG. Blanchard. 'High-Dimensional Multi-Task Averaging and Application to Kernel Mean Embedding'. November 2020
• 63 misc DanielD. Perez. 'On ${C}^{0}$-persistent homology and trees'. December 2020
• 64 misc DanielD. Perez. 'On the persistent homology of almost surely ${C}^{0}$ stochastic processes'. December 2020
• 65 misc MartinM. Royer, FrédéricF. Chazal, ClémentC. Levrard, YuheiY. Umeda and YuichiY. Ike. 'ATOL: Measure Vectorization for Automatic Topologically-Oriented Learning'. February 2020
• 66 misc El MehdiE. Saad, GillesG. Blanchard and SylvainS. Arlot. 'Online Orthogonal Matching Pursuit'. February 2021
• 67 misc RaphaëlR. Tinarrage. 'Computing persistent Stiefel-Whitney classes of line bundles'. May 2020
• 68 misc RaphaëlR. Tinarrage. 'Recovering the homology of immersed manifolds'. June 2020
• 69 misc OleksandrO. Zadorozhnyi, GillesG. Blanchard and AlexandraA. Carpentier. 'Restless dependent bandits with fading memory'. December 2020