Keywords
 A3.1. Data
 A3.1.1. Modeling, representation
 A3.2. Knowledge
 A3.2.3. Inference
 A3.3. Data and knowledge analysis
 A3.3.1. Online analytical processing
 A3.3.2. Data mining
 A3.3.3. Big data analysis
 A3.4.1. Supervised learning
 A3.4.2. Unsupervised learning
 A3.4.4. Optimization and learning
 A3.4.7. Kernel methods
 A6. Modeling, simulation and control
 A6.1. Methods in mathematical modeling
 A6.1.2. Stochastic Modeling
 A6.2. Scientific computing, Numerical Analysis & Optimization
 A6.2.3. Probabilistic methods
 A6.2.4. Statistical methods
 A6.4. Automatic control
 A6.4.2. Stochastic control
 B1. Life sciences
 B1.1. Biology
 B1.1.2. Molecular and cellular biology
 B1.1.10. Systems and synthetic biology
 B1.1.11. Plant Biology
 B2.2. Physiology and diseases
 B2.2.1. Cardiovascular and respiratory diseases
 B2.2.3. Cancer
 B2.3. Epidemiology
 B2.4. Therapies
 B5.5. Materials
1 Team members, visitors, external collaborators
Research Scientists
 Nicolas Champagnat [Inria, Senior Researcher, from Dec 2020, HDR]
 Coralie Fritsch [Inria, Researcher, from Dec 2020]
 Ulysse Herbach [Inria, Researcher]
 Bruno Scherrer [Inria, Researcher, HDR]
Faculty Members
 Anne Gégout Petit [Team leader, Univ de Lorraine, Professor, HDR]
 Thierry Bastogne [Univ de Lorraine, Professor, HDR]
 Sandie Ferrigno [Univ de Lorraine, Associate Professor]
 Sophie Mezieres [Univ de Lorraine, Associate Professor]
 JeanMarie Monnez [Univ de Lorraine, Emeritus, HDR]
 Aurélie MullerGueudin [Univ de Lorraine, Associate Professor]
 Samy Tindel [Univ de Lorraine, Professor, HDR]
 Pierre Vallois [Univ de Lorraine, Professor, HDR]
 Denis Villemonais [Univ de Lorraine, Associate Professor, from Sep 2020, HDR]
PostDoctoral Fellows
 Emma Horton [Université de Bath  Angleterre, until Nov 2020]
 William Ocafrain [Inria, from Dec 2020]
PhD Students
 Vincent Hass [Inria, from Dec 2020]
 Clémence Karmann [Univ de Lorraine, until Aug 2020]
 Rodolphe Loubaton [Univ de Lorraine, from Dec 2020]
 Nassim Sahki [Inria]
 Nino Vieillard [Google, CIFRE]
 Nicolás Zalduendo Vidal [Inria, from Dec 2020]
Technical Staff
 Benoît Lalloué [Centre hospitalier universitaire de Nancy, Engineer]
 Nicolas Thorr [Inria, Engineer, from Dec 2020]
Interns and Apprentices
 Salma Aziz [Inria, from Aug 2020 until Sep 2020]
 Alfred Kamdem Tezanlekeu [Inria, from Sep 2020]
Administrative Assistant
 Celine Cordier [Inria]
External Collaborators
 Céline Lacaux [Univ d'Avignon et des pays du Vaucluse, HDR]
 Lionel Lenôtre [Univ de Haute Alsace]
2 Overall objectives
BIGS is a joint team of Inria, CNRS and Université Lorraine, via the Institut Élie Cartan, UMR 7502 CNRSUL laboratory in mathematics, of which Inria is a strong partner. One member of BIGS, T. Bastogne, comes from the Research Center of Automatic Control of Nancy (CRAN), with which BIGS has strong relations in the domain "HealthBiologySignal". Our research is mainly focused on stochastic modeling and statistics but also aiming at a better understanding of biological systems. BIGS involves applied mathematicians whose research interests mainly concern probability and statistics. More precisely, our attention is directed on (1) stochastic modeling, (2) estimation and control for stochastic processes, (3) algorithms and estimation for graph data and (4) regression and machine learning. The main objective of BIGS is to exploit these skills in applied mathematics to provide a better understanding of issues arising in life sciences, with a special focus on (1) tumor growth, (2) photodynamic therapy, (3) population studies of genomic data and of microorganisms genomics, (4) epidemiology and ehealth.
3 Research program
3.1 Introduction
We give here the main lines of our research that belongs to the domains of probability and statistics. For clarity, we made the choice to structure them in four items. Although this choice was not arbitrary, the outlines between these items are sometimes fuzzy because each of them deals with modeling and inference and they are all interconnected.
3.2 Stochastic modeling
Our aim is to propose relevant stochastic frameworks for the modeling and the understanding of biological systems. The stochastic processes are particularly suitable for this purpose. Among them, Markov chains give a first framework for the modeling of population of cells 73, 50. Piecewise deterministic processes are non diffusion processes also frequently used in the biological context 40, 49, 42. Among Markov model, we developed strong expertise about processes derived from Brownian motion and Stochastic Differential Equations 66, 48. For instance, knowledge about Brownian or random walk excursions 72, 64 helps to analyse genetic sequences and to develop inference about it. However, nature provides us with many examples of systems such that the observed signal has a given Hölder regularity, which does not correspond to the one we might expect from a system driven by ordinary Brownian motion.
This situation is commonly handled by noisy equations driven by Gaussian processes such as fractional Brownian motion of fractional fields. The basic aspects of these differential equations are now well understood, mainly thanks to the socalled rough paths tools 56, but also invoking the RussoVallois integration techniques 65. The specific issue of Volterra equations driven by fractional Brownian motion, which is central for the subdiffusion within proteins problem, is addressed in 41. Many generalizations (Gaussian or not) of this model have been recently proposed for some Gaussian locally selfsimilar fields, or for some nonGaussian models 53, or for anisotropic models 37.
3.3 Estimation and control for stochastic processes
We develop inference about stochastic processes that we use for modeling. Control of stochastic processes is also a way to optimise administration (dose, frequency) of therapy.
There are many estimation techniques for diffusion processes or coefficients of fractional or multifractional Brownian motion according to a set of observations 52, 33, 39. But, the inference problem for diffusions driven by a fractional Brownian motion is still in its infancy. Our team has a good expertise about inference of the jump rate and the kernel of Piecewise Deterministic Markov Processes (PDMP) 31, 30, 29, 32. However, there are many directions to go further into. For instance, previous works made the assumption of a complete observation of jumps and mode, that is unrealistic in practice. We tackle the problem of inference of "Hidden PDMP". As an example, in pharmacokinetics modeling inference, we want to take into account for presence of timing noise and identification from longitudinal data. We have expertise on this subjects 34, and we also used mixed models to estimate tumor growth 35.
We consider the control of stochastic processes within the framework of Markov Decision Processes 63 and their generalization known as multiplayer stochastic games, with a particular focus on infinitehorizon problems. In this context, we are interested in the complexity analysis of standard algorithms, as well as the proposition and analysis of numerical approximate schemes for large problems in the spirit of 36. Regarding complexity, a central topic of research is the analysis of the Policy Iteration algorithm, which has made significant progress in the last years 75, 62, 47, 69, but is still not fully understood. For large problems, we have a long experience of sensitivity analysis of approximate dynamic programming algorithms for Markov Decision Processes 71, 70, 67, 55, 68, and we currently investigate whether/how similar ideas may be adapted to multiplayer stochastic games.
3.4 Algorithms and estimation for graph data
A graph data structure consists of a set of nodes, together with a set of pairs of these nodes called edges. This type of data is frequently used in biology because they provide a mathematical representation of many concepts such as biological structures and networks of relationships in a population. Some attention has recently been focused in the group on modeling and inference for graph data.
Network inference is the process of making inference about the link between two variables taking into account the information about other variables. 74 gives a very good introduction and many references about network inference and mining. Many methods are available to infer and test edges in Gaussian graphical models 74, 57, 45, 46. However, when dealing with abundance data, because inflated zero data, we are far from gaussian assumption and we want to develop inference in this case.
Among graphs, trees play a special role because they offer a good model for many biological concepts, from RNA to phylogenetic trees through plant structures. Our research deals with several aspects of tree data. In particular, we work on statistical inference for this type of data under a given stochastic model. We also work on lossy compression of trees via directed acyclic graphs. These methods enable us to compute distances between tree data faster than from the original structures and with a high accuracy.
3.5 Regression and machine learning
Regression models and machine learning aim at inferring statistical links between a variable of interest and covariates. In biological study, it is always important to develop adapted learning methods both in the context of standard data and also for data of high dimension (with sometimes few observations) and very massive or online data.
Many methods are available to estimate conditional quantiles and test dependencies 61, 51. Among them we have developed nonparametric estimation by local analysis via kernel methods 43, 44 and we want to study properties of this estimator in order to derive a measure of risk like confidence band and test. We study also many other regression models like survival analysis, spatio temporal models with covariates. Among the multiple regression models, we want to develop omnibus tests that examine several assumptions together.
Concerning the analysis of high dimensional data, our view on the topic relies on the French data analysis school, specifically on Factorial Analysis tools. In this context, stochastic approximation is an essential tool 54, which allows one to approximate eigenvectors in a stepwise manner 59, 58, 60. BIGS aims at performing accurate classification or clustering by taking advantage of the possibility of updating the information "online" using stochastic approximation algorithms 38. We focus on several incremental procedures for regression and data analysis like linear and logistic regressions and PCA (Principal Component Analysis).
We also focus on the biological context of highthroughput bioassays in which several hundreds or thousands of biological signals are measured for a posterior analysis. We have to account for the interindividual variability within the modeling procedure. We aim at developing a new solution based on an ARX (Auto Regressive model with eXternal inputs) model structure using the EM (ExpectationMaximisation) algorithm for the estimation of the model parameters.
4 Application domains
4.1 Tumor growthoncology
On this topic, we want to propose branching processes to model appearance of mutations in tumor through new collaborations with clinicians. The observed process is the "circulating DNA" (ctDNA). The final purpose is to use ctDNA as a early biomarker of the resistance to an immunotherapy treatment. It is the aim of the ITMO project. Another topic is the identification of dynamic network of expression. In the ongoing work on lowgrade gliomas, a local database of 400 patients will be soon available to construct models. We plan to extend it through national and international collaborations (Montpellier CHU, Montreal CRHUM). Our aim is to build a decisionaid tool for personalised medicine. In the same context, there is a topic of clustering analysis of a brain cartography obtained by sensorial simulations during awake surgery.
4.2 Genomic data and microorganisms population
Despite of his 'G' in the name of BIGS, Genetics is not central in the applications of the team. However, we want to contribute to a better understanding of the correlations between genes trough their expression data and of the genetic bases of drug response and disease. We have contributed to methods detecting proteomics and transcriptomics variables linked with the outcome of a treatment.
4.3 Epidemiology and ehealth
We have many works to do in our ongoing projects in the context of personalized medicine with CHU Nancy. They deal with biomarkers research, prognostic value of quantitative variables and events, scoring, and adverse events. We also want to develop our expertise in rupture detection in a project with APHP (Assistance Publique Hôpitaux de Paris) for the detection of adverse events, earlier than the clinical signs and symptoms. The clinical relevance of predictive analytics is obvious for highrisk patients such as those with solid organ transplantation or severe chronic respiratory disease for instance. The main challenge is the rupture detection in multivariate and heterogeneous signals (for instance daily measures of electrocardiogram, body temperature, spirometry parameters, sleep duration, etc.). Other collaborations with clinicians concern foetopathology and we want to use our work on conditional distribution function to explain fetal and child growth. We have data from the "Service de foetopathologie et de placentologie" of the "Maternité Régionale Universitaire" (CHU Nancy).
4.4 Dynamics of telomeres
Telomeres are disposable buffers at the ends of chromosomes which are truncated during cell division; so that, over time, due to each cell division, the telomere ends become shorter. By this way, they are markers of aging. Through a collaboration with Pr A. Benetos, geriatrician at CHU Nancy, we recently obtained data on the distribution of the length of telomeres from blood cells. With members of Inria team TOSCA, we want to work in three connected directions: (1) refine methodology for the analysis of the available data; (2) propose a dynamical model for the lengths of telomeres and study its mathematical properties (long term behavior, quasistationarity, etc.); and (3) use these properties to develop new statistical methods. A slot of postdoc position is already planned in the Lorraine Université d'Excellence, LUE project GEENAGE (managed by CHU Nancy).
5 Social and environmental responsibility
We followed Inria's recommendations to get involved in the fight against COVID 19. We responded to the WHO's encouragement, relayed by our mathematical colleagues at the national level, to conduct seroprevalence studies in randomly drawn samples of the population. This is the purpose of the COVAL study described in the results section, initiated by Pierre Vallois.
6 Highlights of the year
The highlight of the year is the merger between BIGS and the members of the former TOSCA team, specialised in modelling for biological sciences and medicine: Nicolas Champagnat, Coralie Fritsch, Denis Villemonais and their postdoc and PhD students. The other highlights of the year are, unsurprisingly, those of the pandemic: most of our teachers devoted a lot of time to distance learning. Other researchers, especially PhD students, suffered from the lack of contacts and meetings. Part of the team was involved in supervising a seroprevalence study. Thanks to the quality of the collaboration with hospital doctors in this study, we are now involved in modelling the amount of coronavirus in wastewater in order to predict the number of hospital admissions.
7 New software and platforms
7.1 New software
7.1.1 AngioAnalytics
 Keywords: Health, Cancer, Biomedical imaging
 Scientific Description: This tool allows the pharmacodynamic characterization of antivascular effects in anticancer treatments. It uses time series of in vivo images provided by intravital microscopy. Such in vivo images are obtained owing to skinfold chambers placed on mice skin. The automatized analysis is split up into two steps that were completely performed separately and manually before. The first steps corresponds to image processing to identify characteristics of the vascular network. The last step is the system identification of the pharmacodynamic response and the statistical analysis of the model parameters.
 Functional Description: AngioAnalytics allows the pharmacodynamic characterization of antivascular effects in anticancer treatments.
 Contact: Thierry Bastogne
 Participant: Thierry Bastogne
7.1.2 ARMADA
 Name: A Statistical Methodology to Select Covariates in HighDimensional Data under Dependence
 Keywords: Biostatistics, Aggregated methods, High Dimensional Data, Personalized medicine, Variable selection
 Functional Description: Two steps variable selection procedure in a context of highdimensional dependent data but few observations. First step is dedicated to eliminate dependence between variables (clustering of variables, followed by factor analysis inside each cluster). Second step is a variable selection using by aggregation of adapted methods. <https://hal.archivesouvertes.fr/hal02173568>
 News of the Year: This package is a new one.

URL:
https://
cran. rproject. org/ web/ packages/ armada/  Publication: hal02363338
 Contacts: Aurélie Muller, Anne GégoutPetit
 Participants: Aurélie Muller, Anne GégoutPetit
7.1.3 kosel
 Name: Variable Selection by Revisited Knockoffs Procedures
 Keywords: Variable selection, Regression
 Functional Description: Performs variable selection for many types of L1regularised regressions using the revisited knockoffs procedure. This procedure uses a matrix of knockoffs of the covariates independent from the response variable Y. The idea is to determine if a covariate belongs to the model depending on whether it enters the model before or after its knockoff. The procedure suits for a wide range of regressions with various types of response variables. Regression models available are exported from the R packages 'glmnet' and 'ordinalNet'. Based on the paper linked to via the URL below: Gegout A., Gueudin A., Karmann C. (2019) <arXiv:1907.03153>
 News of the Year: This package is a new one.

URL:
https://
cran. rproject. org/ web/ packages/ kosel/ kosel. pdf  Publication: hal01799914
 Contacts: Clémence Karmann, Aurélie Muller
 Participants: Clémence Karmann, Aurélie Muller, Anne GégoutPetit
7.1.4 SesIndexCreatoR
 Functional Description: This package allows computing and visualizing socioeconomic indices and categories distributions from datasets of socioeconomic variables (These tools were developed as part of the EquitArea Project, a public health program).

URL:
http://
www. equitarea. org/ documents/ packages_1. 00/  Contact: Benoît Lalloué
 Participants: Benoît Lalloué, JeanMarie Monnez, Nolwenn Le Meur, Severine Deguen
7.1.5 In silico
 Name: In silico design of nanoparticles for the treatment of cancers by enhanced radiotherapy
 Keywords: Bioinformatics, Cancer, Drug development
 Functional Description: To speed up the preclinical development of medical engineered nanomaterials, we have designed an integrated computing platform dedicated to the virtual screening of nanostructured materials activated by Xray making it possible to select nanoobjects presenting interesting medical properties faster. The main advantage of this in silico design approach is to virtually screen a lot of possible formulations and to rapidly select the most promising ones. The platform can currently handle the accelerated design of radiation therapy enhancing nanoparticles and medical imaging nanosized contrast agents as well as the comparison between nanoobjects and the optimization of existing materials.
 Contact: Thierry Bastogne
 Participant: Thierry Bastogne
7.1.6 HSPOR
 Name: Hidden Smooth Polynomial Regression for Rupture Detection
 Keywords: Polynomial regression, Rupture detection
 Functional Description: Several functions that allow by different methods to infer a piecewise polynomial regression model under regularity constraints, namely continuity or differentiability of the link function. The implemented functions are either specific to data with two regimes, or generic for any number of regimes, which can be given by the user or learned by the algorithm.
 News of the Year: This package is a new one

URL:
https://
cran. rproject. org/ web/ packages/ HSPOR/  Contact: Florine Greciet
 Participants: Florine Greciet, Romain Azais, Anne GégoutPetit
7.1.7 cvmgof
 Keywords: Regression, Test, Estimators
 Scientific Description: Many goodnessoffit tests have been developed to assess the different assumptions of a (possibly heteroscedastic) regression model. Most of them are "directional" in that they detect departures from a given assumption of the model. Other tests are "global" (or "omnibus") in that they assess whether a model fits a dataset on all its assumptions. cvmgof focuses on the task of choosing the structural part of the regression function because it contains easily interpretable information about the studied relationship. It implements 2 nonparametric "directional" tests and one nonparametric "global" test, all based on generalizations of the Cramervon Mises statistic.
 Functional Description: cvmgof is an R library devoted to Cramervon Mises goodnessoffit tests. It implements three nonparametric statistical methods based on Cramervon Mises statistics to estimate and test a regression model.
 News of the Year: New version available on CRAN website since Jan 11 2021 Preprint available on HAL since Jan 7 2021

URL:
https://
cran. rproject. org/ web/ packages/ cvmgof/ index. html  Publication: hal03101612v1
 Contacts: Sandie Ferrigno, Romain Azais
 Participants: Sandie Ferrigno, MarieJosé Martinez, Romain Azais
7.1.8 starm R
 Name: SpatioTemporal Autologistic Regression Model, package R
 Keywords: Spatiotemporal, Autologistic model
 Functional Description: Estimation and model selection of the twotime centered autologistic regression model based on GegoutPetit A., GuerinDubrana L., Li S. "A new centered spatiotemporal autologistic regression model. Application to local spread of plant diseases." 2019 <arXiv:1811.06782>. Application for the spatiotemporal modelling of the spread of a disease on a grid over time.
 Contact: Anne GégoutPetit
8 New results
8.1 Stochastic modelling
Participants: Anne GégoutPetit, Ulysse Herbach, Sophie WantzMézières, Pierre Vallois.
8.1.1 Modelling of diffuse lowgrade gliomas growth
We are continuing our research on the modelling of the growth of low grade diffuse gliomas. We propose an original MRIbased method to quantify gliomas brain infiltration, easy to implement and to interpret for Neurooncologists. The aim is to guide the treatment strategy in giving functional information using only anatomical knowlege and conventional MRI sequences. This work has been the subject of a conference paper 15.
A retrospective survival study over 35 years followup has been done 9.
8.1.2 Reconstruction of epigenetic landscapes from singlecell data
The aim is to better understand how living cells make decisions (e.g., differentiation of a stem cell into a particular specialized type), seeing decisionmaking as an emergent property of an underlying complex molecular network. Indeed, it is now proven that cells react probabilistically to their environment: cell types do not correspond to fixed states, but rather to “potential wells” of a certain energy landscape (representing the energy of the possible states of the cell) that we are trying to reconstruct. A first paper proposing a reconstruction method has been submitted 24 in the framework of an international collaboration (USA, Switzerland, France). Another paper is about to be submitted, dealing more specifically with the inference of the underlying networks.
Joint work with Nan Papili Gao (ETH Zurich), Olivier Gandrillon (ENS Lyon), András Páldi (EPHE, Paris), and Rudiyanto Gunawan (University at Buffalo, New York)
8.2 Optimal control of Markov processes
Participants: Bruno Scherrer, Nino Vieillard.
In 13, we adapt the optimization's concept of momentum to reinforcement learning. Seeing the stateaction value functions as an analog to the gradients in optimization, we interpret momentum as an average of consecutive qfunctions. We derive Momentum Value Iteration (MoVI), a variation of Value iteration that incorporates this momentum idea. Our analysis shows that this allows MoVI to average errors over successive iterations. We show that the proposed approach can be readily extended to deep learning. Specifically, we propose a simple improvement on DQN based on MoVI, and experiment it on Atari games. This work has been published in the AISTATS conference.
Recent Reinforcement Learning (RL) algorithms making use of KullbackLeibler (KL) regularization as a core component have shown outstanding performance. Yet, only little is understood theoretically about why KL regularization helps, so far. In 12, we study KL regularization within an approximate value iteration scheme and show that it implicitly averages qvalues. Leveraging this insight, we provide a very strong performance bound, the very first to combine two desirable aspects: a linear dependency to the horizon (instead of quadratic) and an error propagation term involving an averaging effect of the estimation errors (instead of an accumulation effect). We also study the more general case of an additional entropy regularizer. The resulting abstract scheme encompasses many existing RL algorithms. Some of our assumptions do not hold with neural networks, so we complement this theoretical analysis with an extensive empirical study. This work has been accepted to the Neurips conference and selected for oral presentation (selection rate: 1.1% of all submissions)
Joint work with Matthieu Geist, Olivier Pietquin, Rémi Munos and Tadashi Kozuno (Google Brain Paris).
8.3 Regression and machine learning
Participants: Thierry Bastogne, Sandie Ferrigno, Anne GégoutPetit, Clémence Karmann, Benoît Lalloué, JeanMarie Monnez, Pauline Guyot, Aurélie Gueudin, Clémence Karmann, Sophie WantzMézières.
8.3.1 Cramér–von Mises goodnessoffit tests in regression models
Many goodnessoffit tests have been developed to assess the different assumptions of a (possibly heteroscedastic) regression model. Most of them are 'directional' in that they detect departures from a given assumption of the model. Other tests are 'global' (or 'omnibus') in that they assess whether a model fits a dataset on all its assumptions. We focus on the task of choosing the structural part of the regression function because it contains easily interpretable information about the studied relationship. We consider 2 nonparametric 'directional' tests and one nonparametric 'global' test, all based on generalizations of the Cramér–von Mises statistic.
To perform these goodnessoffit tests, we develop the R package cvmgof (https://
To complete this work, it would be interesting to assess the other assumptions of a regression model such as the functional form of the variance or the additivity of the random error term. It should be noted that this can already be done using Ducharme and Ferrigno test implemented in cvmgof since it is a global test. However, it would be relevant to compare the results obtained from Ducharme and Ferrigno test with the ones obtained from other directional tests, especially developed to assess one of these specific assumptions. The implementation of these directional tests would enrich cvmgof package and offer a complete easytouse tool for validating regression models. Moreover, the assessment of the overall validity of the model when using several directional tests could be compared with that done when using only a global test. In particular, the wellknown problem of multiple testing could be discussed by comparing the results obtained from multiple test procedures with those obtained when using a global test strategy. Another perspective of this work would be to develop a similar tool for other statistical models widely used in practice such as generalized linear models.
Join work with Romain Azaïs (INRIA, ENS Lyon) and MarieJosé Martinez (LJK, Université Grenoble Alpes).
8.3.2 The revisited knockoffs method for variable selection in L1penalized regressions
We consider the problem of variable selection in regression models. In particular, we are interested in selecting explanatory covariates linked with the response variable and we want to determine which covariates are relevant, that is which covariates are involved in the model. In this framework, we deal with L1penalized regression models. To handle the choice of the penalty parameter to perform variable selection, we develop a new method based on the knockoffs idea. This revisited knockoffs method is general, suitable for a wide range of regressions with various types of response variables. Besides, it also works when the number of observations is smaller than the number of covariates and gives an order of importance of the covariates. Finally, we provide many experimental results to corroborate our method and compare it with other variable selection methods. This work is published in 5 and is implemented in package ‘kosel’.
The next subsections are dedicated to online data analysis
8.3.3 Widening the scope of an eigenvector stochastic approximation process and application to streaming PCA and related methods
Accepted in Journal of Multivariate Analysis in October 2020 8.
We prove the almost sure convergence of processes of Oja type to eigenvectors of the expectation of a random matrix while relaxing the i.i.d. assumptions on the observed random matrices. As an application of this generalization, we can perform the online PCA of a random vector Z when there is a data stream of i.i.d. observations of Z, even when both the metric used M and the expectation of Z are unknown and estimated online. Moreover, in order to update the stochastic approximation process at each step we are no more bound to using only a data minibatch of observations of Z, but we can use all the previous observations up to the current step without storing them. This is useful not only when dealing with streaming data but also with Big Data as on can process it sequentially as a data stream. In addition, the general framework of this process, unlike other algorithms in the literature, covers also the case of factorial methods related to PCA.
In collaboration with A. Skiredj.
8.3.4 Streaming constrained binary logistic regression with online standardized data
Accepted in "Journal of Applied Statistics" in December 2020 7.
Online learning is a method for analyzing very large datasets ("big data") as well as data streams. In this article, we consider the case of constrained binary logistic regression and show the interest of using processes with an online standardization of the data, in particular to avoid numerical explosions or to allow the use of shrinkage methods. We prove the almost sure convergence of such a process and propose using a piecewise constant stepsize such that the latter does not decrease too quickly and does not reduce the speed of convergence. We compare twentyfour stochastic approximation processes with raw or online standardized data on five real or simulated datasets. Results show that, unlike processes with raw data, processes with online standardized data can prevent numerical explosions and yield the best results.
In collaboration with E. Albuisson.
8.3.5 Construction and update of an online ensemble score involving linear discriminant analysis and logistic regression
Submitted in Februray 2021 25, 20.
The present aim is to update, upon arrival of new learning data, the parameters of a score constructed with an ensemble method involving linear discriminant analysis and logistic regression in an online setting, without the need to store all of the previously obtained data. Poisson bootstrap and stochastic approximation processes were used with online standardized data to avoid numerical explosions, the convergence of which has been established theoretically. This empirical convergence of online ensemble scores to a reference "batch" score was studied on five different datasets from which data streams were simulated, comparing six different processes to construct the online scores. For each score, 50 replications using a total of $10N$ observations ($N$ being the size of the dataset) were performed to assess the convergence and the stability of the method, computing the mean and standard deviation of a convergence criterion. A complementary study using $100N$ observations was also performed. The best processes were averaged processes using online standardized data and a piecewise constant stepsize.
8.3.6 Changepoint detection theresholds in the sequential context
Our work around changepoint theresholds for the scorebased CUSUM statistic in a sequential context has been published 11. In this paper, we consider the scorebased cumulative sum statistic and propose to evaluate the detection performance of somethresholds on simulated data. Three thresholds come from the literature: the Wald constant, the empirical constant, and the conditional empirical instantaneous threshold. Two new thresholds are built by a simulationbased procedure: the first one is instantaneous, the second is a dynamical version of the previous one. The thresholds' performance measured by an estimation of the mean time between false alarm (MTBFA) and the average detection delay (ADD) are evaluated on independent and autocorrelated data for several scenarios, according to the detection objective and the real change in the data. The simulations allow us to compare the difference between the thresholds' results and to see that their performances prove to be robust when a parameter of the prechange regime is poorly estimated or when the data independence assumption is violated. We found also that the conditional empirical threshold is the best at minimizing the detection delay while maintaining the given false alarm rate. However, on real data, we suggest to use the dynamic instantaneous threshold because it is the easiest to build for practical implementation.
Our collaboration with APHP could not succeed because of the great delay in data collection. To apply our algorithms to real data, we turned to some EMG signal data provided by INRS. The study concerns the development of trapezius muscle myalgia in the workplace. We apply changepoint detection to characterise different computer activities carried out during an experimental day.
8.4 Statistical learning and application in health
Participants: Ulysse Herbach, Sandie Ferrigno, Anne GégoutPetit, Aurélie Gueudin, Pierre Vallois, Benoît Lalloué, JeanMarie Monnez, Nicolas Thorr, Pierre Vallois.
8.4.1 Estimation of reference curves for fetal weight
In Epidemiology, we are working with INSERM to study fetal development in the last two trimesters of pregnancy. Reference or standard curves are required in this kind of biomedical problems. Values which lie outside the limits of these reference curves may indicate the presence of disorder. Data are from the French EDEN motherchild cohort (INSERM). It's a motherchild cohort study investigating the prenatal and early postnatal determinants of child health and development. 2002 pregnant women were recruited before 24 weeks of amenorrhoea in two maternity clinics from middlesized French cities (Nancy and Poitiers). From May 2003 to September 2006, 1899 newborns were then included. The main outcomes of interest are fetal (via ultrasound) and postnatal growth, adiposity development, respiratory health, atopy, behaviour and bone, cognitive and motor development. We are studying fetal weight that depends on the gestional age in the second and the third trimesters of mother's pregnancy. Some classical empirical and parametric methods as polynomial are first used to construct these curves. Polynomial regression is one of the most common parametric approach for modelling growth data espacially during the prenatal period. However, some of them requires strong assumptions. So, we propose to work with semiparametric LMS method, by modifying the response variable (fetal weight) with a Boxcox transformation. A first article detailing these methodologies applied to the data is being written.
Alternative nonparametric methods as NadarayaWatson kernel estimation, local polynomial estimation, Bsplines or cubic splines are also developed in this context to construct these curves. The practical implementation of these methods required working on smoothing parameters or choice of knots for the different types of nonparametric estimation. In particular, optimal choice of these parameters has been proposed. Then, a first version of an R package has been developed to propose a tool to construct nonparametric reference curves. This should be submitted to CRAN very soon. In addition, a graphical interface (GUI) intended for practitioners has been developed to allow intuitive visualization of the results given by the package.
Join work with Myriam MaumyBertrand (IRMA, Université de Strasbourg) and INSERM.
8.4.2 Construction of parsimonious event risk scores by an ensemble method. An illustration for shortterm predictions in chronic heart failure patients from the GISSIHF trial
Submitted in December 2020 27.
Heart failure (HF) is a worldwide major cause of mortality and morbidity for which many predictive scores have been defined. Selecting which explanatory variables to include in a given score is a common difficulty, as a balance must be found between statistical fit and practical application. This article presents a methodology for constructing parsimonious event scores combining a stepwise selection of variables with ensemble scores obtained by aggregation of several scores, using several classifiers, bootstrap samples and various modalities of random selection of variables. The stepwise selection allows constructing a succession of scores with the practitioner able to choose which score best fits his or her needs. The methods proposed herein can be reproduced on any set of variables as long as the training dataset comprises a sufficient number of cases. Three methods were compared in an application to construct parsimonious shortterm scores in chronic HF patients. The working sample consisted of 11,411 couples patientvisit dyads from the GISSIHF database, with 5,595 events and 5,816 nonevents. Sixtytwo candidate explanatory variables were studied. Focusing on the fastest method, four scores were constructed, yielding outofbag AUCs ranging from 0.81 (26 variables) to 0.76 (2 variables). These results are slightly better than those obtained by other scores reported in the literature using a similar number of variables.
In collaboration with E. Albuisson and D. Lucci.
8.4.3 Modeling and estimation of circulating tumor DNA (ctDNA) dynamics for detecting resistance to targeted therapies
Continuation of the ITMO Cancer project, supervised by Nicolas Champagnat, concerning the modeling of circulating tumor DNA (ctDNA) to detect the appearance of resistance to targeted therapies (personalized medicine). After a phase of investigation of possible scenarios in collaboration with Alexandre Harlé of the Institute of Cancerology of Lorraine (ICL), a final model was selected. Based on a mathematical analysis, the members of the project then designed a statistical inference algorithm (learning the parameters of the model, including the genealogical tree of mutations for each patient) which is intended to be validated on real data currently being acquired at the Nancy CHRU. The general idea is to exploit a “variational principle” that allows to explore the discrete space of family trees, of very large size, through a “pivot” space of continuous parameters, easy to optimize (and in reasonable numbers). An article detailing the model and its inference is currently being written.
In collaboration with N. Champagnat and C. Fritsch.
8.4.4 A statistical methodology to select covariates in highdimensional data under dependence. Application to the classification of genetic profiles in oncology
We propose a new methodology for selecting and ranking covariates associated with a variable of interest in a context of highdimensional data under dependence but few observations. The methodology successively intertwines the clustering of covariates, decorrelation of covariates using Factor Latent Analysis, selection using aggregation of adapted methods and finally ranking. A simulation study shows the interest of the decorrelation inside the different clusters of covariates. We first apply our method to transcriptomic data of 37 patients with advanced nonsmallcell lung cancer who have received chemotherapy, to select the transcriptomic covariates that explain the survival outcome of the treatment. Secondly, we apply our method to 79 breast tumor samples to define patient profiles for a new metastatic biomarker and associated gene network in order to personalize the treatments. This work is published in 2 and is implemented in R package ‘ARMADA’.
In collaboration with T. Boukhobza and H. Dumond from CRAN and B. Bastien from biopharmaceutical industry Transgene.
8.4.5 Project linked with the COVID 19 pandemic
Pierre Vallois is the scientific coordinator of the seroprevalence study COVAL Nancy held in Nancy in July 2020 in collaboration with CHRU de Nancy (CIC épidémiologie clinique and Laboratoire de Virologie).
Background. The World Health Organisation recommends monitoring the circulation of severe acute respiratory syndrome coronavirus 2 (SARSCoV2). We aimed to estimate anti–SARSCoV2 total immunoglobulin (IgT) antibody seroprevalence and describe symptom profiles and in vitro seroneutralization in Nancy, France, in spring 2020.
Methods. Individuals were randomly sampled from electoral lists and invited with household members over 5 years old to be tested for anti–SARSCoV2 (IgT, i.e. IgA/IgG/IgM) antibodies by ELISA (Biorad). Serum samples were classified according to seroneutralization activity 50 % (NT50) on Vero CCL81 cells. Age and sexadjusted seroprevalence was estimated. Subgroups were compared by chisquare or Fisher exact test and logistic regression.
Results. Among 2006 individuals, 43 were SARSCoV2–positive; the raw seroprevalence was 2.1 % (95 % confidence interval 1.5 to 2.9), with adjusted metropolitan and national standardized seroprevalence 2.5 % (1.8 to 3.3) and 2.3 % (1.7 to 3.1). Seroprevalence was highest for 20 to 34yearold participants (4.7 % [2.3 to 8.4]), within than out of socially deprived area (2.5 % vs 1 %, P=0.02) and with than without intrafamily infection (p<106). Moreover, 25 % (23 to 27) of participants presented at least one COVID19 symptom associated with SARSCoV2 positivity (p<1013), with anosmia or ageusia highly discriminant (odds ratio 27.8 [13.9 to 54.5]), associated with dyspnea and fever. Among the SARSCoV2positives, 16.3 % (6.8 to 30.7) were asymptomatic. For 31 of these individuals, positive seroneutralization was demonstrated in vitro.
Conclusions. In this population of very low antiSARSCoV2 antibody seroprevalence, a beneficial effect of the lockdown can be assumed, with frequent SARSCoV2 seroneutralization among IgTpositive patients.
9 Bilateral contracts and grants with industry
9.1 Bilateral contracts with industry
 R. Azaïs, A. GégoutPetit, F. Greciet collaborated with SAFRAN Aircraft Engines (through a 20162019 contract). SAFRAN Aircraft Engines designs and products aircraft engines. For the design of pieces, they have to understand the mechanism of crack propagation under different conditions. BIGS models crack propagation with Piecewise Deterministic Markov Processes (PDMP).
 B. Scherrer collaborate with Google brain on reinforcement learning in the framework of the PhD thesis of Nino Vieillard
10 Partnerships and cooperations
10.1 International initiatives
10.1.1 Participation in other international programs
In Fall 2020, Bruno Scherrer was invited for 4 months in Berkeley to participate to Simons Institute Programme on the Theory of Reinforcement Learning. Due to the Covid constraints, the semester was eventually hold online.
10.2 International research visitors
10.2.1 Visits of international scientists
Juhyun Park (Lancaster University) visited Nancy for one week in the framework of her collaboration with A. GégoutPetit on statistical test for paired distribution.
10.3 National initiatives
 FHU CARTAGE (Fédération Hospitalo Universitaire Cardial and ARTerial AGEing ; leader : Pr Athanase Benetos), JeanMarie Monnez, Benoît Lalloué, Anne GégoutPetit.
 RHU Fight HF (Fighting Heart Failure; leader: Pr Patrick Rossignol), located at the University Hospital of Nancy, JeanMarie Monnez, Benoît Lalloué.
 Project "Handle your heart", team responsible for the creation of a drug prescription support software for the treatment of heart failure, head: JeanMarie Monnez.
 A. GégoutPetit, N. Sahki, S. Mézières are involved in the learning aspect of the clinical protocol "EOLEVAL" with Assistance Publique des Hopitaux de Paris (APHP).
 "ITMO Physics, mathematics applied to Cancer" (20172019): "Modeling ctDNA dynamics for detecting targeted therapy", Funding organisms: ITMO Cancer, ITMO Technologies pour la santé de l’alliance nationale pour les sciences de la vie et de la santé (AVIESAN), INCa, Leader: N. Champagnat (Inria TOSCA), Participants: A. GégoutPetit, A. MullerGueudin, P. Vallois, U. Herbach.
 PEPS AMIES (20192020), Etude Biométrique en foetopathologie et développement de l'enfant, Collaboration between Institut Elie Cartan and the CRESS INSERM, S. Ferrigno.
 Modular, multivalent and multiplexed tools for dual molecular imaging (20172020), Funding organism: ANR, Leader: B Kuhnast (CEA). Participant: T. Bastogne.
 Sophie Mézières belongs to GDR 720 ISIS, Funding organism: CNRS, leader: Laure BlancFéraud.
10.4 Regional initiatives
 CHRU de Nancy. We have good collaborations with several researchers from CHRU de Nancy. We are involved in LUE Impact Geenage in research axis telomeres.
 CHRU de Nancy. Joint initiave of the SarsCov2 seroprevalence study COVAL Nancy with CIC épidémiologie. https://
clinicaltrials. gov
11 Dissemination
11.1 Promoting scientific activities
11.1.1 Journal
 Ulysse Herbach was a guest editor for the journal “Mathematical Biosciences and Engineering” (special edition “Cells as dynamical systems”).
11.1.2 Invited talks
 Anne GégoutPetit was invited to a plenary communication in “Journées de Statistique”, Nice, France.
 Ulysse Herbach was invited to a plenary communication in conference “Interplay between Oncology, Mathematics and Numerics”, Paris, France.
11.1.3 Research administration
 Anne GégoutPetit is the head of “Institut Élie Cartan de Lorraine” (mathematics laboratory of Université de Lorraine) since September 1st.
11.2 Teaching  Supervision  Juries
11.2.1 Teaching
Bruno Scherrer and Ulysse Herbach excepted, BIGS members have teaching obligations at "Université Lorraine" and are teaching at least 192 hours each year. They teach probability and statistics at different levels (Licence, Master, Engineering school). Many of them have pedagogical responsibilities.
 A. GégoutPetit: Head of the Master 2 "Ingénierie Mathématique pour la science des données (Mathematical Engineering for data science)", Université de Lorraine
 T. Bastogne is in charge research master program "Santé Numérique et Imagerie Médicale" with the Faculty of Medicine, Université de Lorraine, France
 Master: S. Ferrigno, Experimental designs, 4.5h, M1, fourth year of EEIGM, Université de Lorraine, France
 Master: S. Ferrigno, Data analyzing and mining, 63h, M2, third year of Ecole des Mines, Université de Lorraine, France
 Master: S.Ferrigno, Modeling and forecasting, 43h, M1, second year of Ecole des Mines, Université de Lorraine, France
 Master: S.Ferrigno, Training projects, 18h, M1/M2, second and third year of Ecole des Mines, Université de Lorraine, France
 Master: A. MullerGueudin, Probability and Statistics, 160h, second year of ENSEM and ENSAIA, University of Lorraine, France.
 Master: A. MullerGueudin, Scientific calculation with Matlab, 20h, second year of ENSAIA, University of Lorraine, France.
 Master: A.GégoutPetit, Statistics, modeling, 15h, future teacher, Université de Lorraine, France
 Master: A.GégoutPetit, Statistics, modeling, data analysis, 80h, master in applied mathematics, Université de Lorraine, France
 Master: S. WantzMézières, Learning and analysis of medical data, 36h, with J.M. Moureaux, Master SNIM, Université de Lorraine, France
 Licence: S. WantzMézières, Applied mathematics for management, financial mathematics, Probability and Statistics, 160h, I.U.T. (L1/L2/L3)
 Licence: S. WantzMézières, Probability, 100h, first year in Telecom Nancy engineering school (initial and apprenticeship cursus)
 Licence: A. MullerGueudin, Statistics, 60h, first year of ENSAIA, University of Lorraine, France.
 Licence: S. Ferrigno, Descriptive and inferential statistics, 60h, L2, second year of EEIGM, Université de Lorraine, France
 Licence: S. Ferrigno, Statistical modeling, 60h, L2, second year of EEIGM, Université de Lorraine, France
 Licence: S. Ferrigno, Mathematical and computational tools, 20h, L3, third year of EEIGM, Université de Lorraine, France
 Licence: S. Ferrigno, Training projects, 20h, L1/L3, first, second and third year of EEIGM, Université de Lorraine, France
11.2.2 Supervision
Defended PhD thesis
 PhD: Florine Greciet, "Modèles markoviens déterministes par morceaux cachés pour la propagation de fissures", grant CIFRE SAFRAN AIRCRAFT ENGINES, Advisors : R. Azaïs, A. GégoutPetit, Université de Lorraine, defense on January, 2020.
PhD thesis
 PhD: Pauline Guyot, "Modélisation et Simulation de l’Electrocardiogramme d’un Patient Numérique", Grant : CIFRECybernano. Advisors: T. Bastogne, E. H. Djermoune.
 PhD: Nassim Shaki, "Détection de rupture dans des signaux multivariés pour la prédiction d’événement redouté à partir de paramètres physiologiques recueillis par capteurs connectés après greffe pulmonaire", grant InriaCordis. Advisors: A. GégoutPetit, S. WantzMézières, M. d'Ortho.
 PhD: Nino Vieillard, "Deep Reinforcement Learning", CIFRE grand with Google Brain Paris. Advisors: B. Scherrer, M. Geist.
Postdoctoral positions
 Benoît Lalloué, contract research engineer for two years, RHU Fight RF, supervised by JeanMarie Monnez.
 Postdoc: Emma Horton, Telomer Modelling, grant LUE GEENAGE. Advisors: A. GégoutPetit, D. Villemonais. Emma was hired CR Inria at Bordeaux SudOuest (ASTRAL team)
Other
 Master: all BIGS members regularly supervise project and internship of master IMOI students.
 Engineering school: all BIGS members regularly supervise projects of “École des Mines”, ENSEM, EEIGM or TélécomNancy students.
11.2.3 Juries
 Anne GégoutPetit wrote the report and participated to the jury of the Phd defense of Titin Agustin NENGSIH, Strasbourg University, March 16th.
 Anne GégoutPetit wrote the report and participated to the jury of the HDR defense of Maud Delattre, ParisSaclay University, November 6th.
 Anne GégoutPetit is member of the “Jury du prix de thèse AMIES”.
 Bruno Scherrer participated to the jury of the Phd defense of Matthieu Guillot, GSCOP lab, Grenoble INP, July 3rd.
 Bruno Scherrer participated to the jury of the Phd defense of Rituraj Kaushik, July 23rd.
11.3 Popularization
11.3.1 Education
 Sandie Ferrigno: Advisor of a group of students (EEIGM), "La main à la Pâte" project, elementary schools, Nancy, JanuaryJune 2020.
 Sandie Ferrigno: Advisor of a group of students (EEIGM), "Energies renouvelables", "La main à la Pâte" project, Institut médicoéducatif (IME), Commercy, January 2020.
 Sandie Ferrigno: Advisor of a group of students (EEIGM), "L'Astronomie", Cgénial project, Collège Paul Verlaine, Malzéville, January 2020.
 Sandie Ferrigno: Advisor of a group of students (EEIGM), "Le Chocolat", Cgénial project, Collège de la Craffe, Nancy, January 2020.
11.3.2 Interventions
 Sophie WantzMézières was part of the organization of a thematic and multidisciplinary week “Neurosciences, Neurooncologie et Numérique” for students from TélécomNancy and Faculté de Médecine de Nancy, janvier 2020.
 Bruno Scherrer made detailed simulations of the reform for the retirement system that has been considered by Philippe's government in France 28.
12 Scientific production
12.1 Publications of the year
International journals
 1 article 'Shortterm effects of ocular 2% dorzolamide, 0.5% timolol or 0.005% latanoprost on the anterior segment architecture in healthy cats: a prospective study.'. Open Veterinary Journal 2020
 2 article'A statistical methodology to select covariates in highdimensional data under dependence. Application to the classification of genetic profiles in oncology'.Journal of Applied Statistics2021, 23
 3 article'Accuracy of Several Lung Ultrasound Methods for the Diagnosis of Acute Heart Failure in the ED: A Multicenter Prospective Study'.Chest1571January 2020, 99110
 4 article 'How to Design a Remote Patient Monitoring System? A French Case Study'. BMC Health Services Research 20 1 December 2020
 5 article 'The revisited knockoffs method for variable selection in L 1 penalized regressions'. Communications in Statistics  Simulation and Computation July 2020
 6 article'A signal demodulationbased method for the early detection of CheyneStokes respiration'.PLoS ONE153March 2020, e0221191
 7 article 'Streaming constrained binary logistic regression with online standardized data'. Journal of Applied Statistics 2021
 8 article'Widening the scope of an eigenvector stochastic approximation process and application to streaming PCA and related methods'.Journal of Multivariate Analysis182March 2021, 19
 9 article'Adult diffuse lowgrade gliomas: 35year experience at the Nancy France neurooncology unit'.Frontiers in Oncology10October 2020, 574679
 10 article 'Cardiovascular risk associated with serum potassium in the context of mineralocorticoid receptor antagonist use in patients with heart failure and left ventricular dysfunction'. European Journal of Heart Failure January 2020
 11 article'Performance study of change‐point detection thresholds for cumulative sum statistic in a sequential context'.Quality and Reliability Engineering International121July 2020, 21
International peerreviewed conferences
 12 inproceedings 'Leverage the Average: an Analysis of KL Regularization in Reinforcement Learning'. NeurIPS  34th Conference on Neural Information Processing Systems Vancouver / Online, Canada December 2020
 13 inproceedings 'Momentum in Reinforcement Learning'. Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS) 2020, Palermo, Italy. PMLR : Volume 108. Copyright 2020 by the author(s). AISTATS 2020  23rd International Conference on Artificial Intelligence and Statistics 108 Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS) 2020 Palermo / Virtual, Italy 2020
Conferences without proceedings
 14 inproceedings 'A datadriven classification solution for the timedup and go test in risk falling assessment'. EMBC 2020  42nd Engineering in Medicine and Biology Conference Montréal, Canada July 2020
 15 inproceedings 'An original MRIbased method to quantify the diffuse lowgrade glioma brain infiltration'. 10th International Conference on Image Processing Theory, Tools and Applications, IPTA’20 Paris, France November 2020
 16 inproceedings 'A dendrogram clustering of lipid nanoparticles'. 15th annual event of the ETPN – European Technology Platform on Nanomedicine, ETPN2020 Heraklion, Greece October 2020
 17 inproceedings 'Approche bayésienne du QualitybyDesign appliquée à un bioprocédé d’extraction de principe actif'. 5th Bioproduction Congress Lyon, France September 2020
 18 inproceedings 'Qualitybydesignengineered pBFT consensus configuration for medical device development'. EMBC 2020  42nd Engineering in Medicine and Biology Conference Montreal, Canada July 2020
 19 inproceedings 'Qualitybydesign development of a patient mobility emonitoring system'. 2nd EAI International Conference on Wearables in Healthcare, EAI HealthWear 2020 Virtual, France 2020
 20 inproceedings 'Convergence d'un score d'ensemble en ligne : étude empirique'. 52e Journées de Statistique Nice, France https://jds2020.sciencesconf.org/ July 2020
Doctoral dissertations and habilitation theses
 21 thesis 'Piecewise polynomial regression for crack propagation'. Université de Lorraine January 2020
Reports & preprints
 22 misc 'cvmgof: an R package for Cramérvon Mises goodnessoffit tests in regression models'. January 2021
 23 misc 'Supplementary material iQbD: a TRLindexed QualitybyDesign Paradigm for Medical Device Development'. September 2020
 24 misc 'Universality of cell differentiation trajectories revealed by a reconstruction of transcriptional uncertainty landscapes from singlecell transcriptomic data'. February 2021
 25 misc 'Construction and update of an online ensemble score involving linear discriminant analysis and logistic regression'. February 2021
 26 misc 'Ensemble methods and online learning for creation and update of prognostic scores in HF patients'. November 2020
 27 misc 'Construction of parsimonious event risk scores by an ensemble method. An illustration for shortterm predictions in chronic heart failure patients from the GISSIHF trial.'. December 2020
 28 report 'Simulations de carrières et retraites à points dans 3 cadres macroéconomiques: modèle du gouvernement Philippe (âgepivot bloqué), modèle du gouvernement Philippe corrigé (âgepivot glissant), modèle Destinie2 (avec revalorisation de la fonction publique)'. INRIA March 2020
12.2 Cited publications
 29 article'A recursive nonparametric estimator for the transition kernel of a piecewisedeterministic Markov process'.ESAIM: Probability and Statistics182014, 726749
 30 article'NonParametric Estimation of the Conditional Distribution of the Interjumping Times for PiecewiseDeterministic Markov Processes'.Scandinavian Journal of Statistics414December 2014, 950969
 31 inproceedings'Nonparametric estimation of the jump rate for nonhomogeneous marked renewal processes'.Annales de l'Institut Henri Poincaré, Probabilités et Statistiques494Institut Henri Poincaré2013, 12041231
 32 article 'Optimal choice among a class of nonparametric estimators of the jump rate for piecewisedeterministic Markov processes'. Electronic journal of statistics 2016
 33 incollection'Semiparametric estimation of the longrange dependence parameter: a survey'.Theory and applications of longrange dependenceBirkhauser Boston2003, 557577
 34 article'Identification of pharmacokinetics models in the presence of timing noise'.Eur. J. Control1422008, 149157URL: http://dx.doi.org/10.3166/ejc.14.149157
 35 article'Phenomenological modeling of tumor diameter growth based on a mixed effects model'.Journal of theoretical biology26232010, 544552
 36 book 'Neurodynamic Programming'. Athena Scientific 1996
 37 article'Multioperator Scaling Random Fields'.Stochastic Processes and their Applications12111MAP5 2011012011, 26422677
 38 article'A fast and recursive algorithm for clustering large datasets with kmedians'.Computational Statistics & Data Analysis5662012, 14341449
 39 article'Simulation and identification of the fractional brownian motion: a bibliographical and comparative study'.Journal of Statistical Software52000, 153
 40 article'Piecewisedeterministic Markov processes: A general class of nondiffusion stochastic models'.Journal of the Royal Statistical Society. Series B (Methodological)1984, 353388
 41 article'Rough Volterra equations. I. The algebraic integration setting'.Stoch. Dyn.932009, 437477URL: http://dx.doi.org/10.1142/S0219493709002737
 42 article'Statistical estimation of a growthfragmentation model observed on a genealogical tree'.Bernoulli2132015, 17601799
 43 article'Un test d'adéquation global pour la fonction de répartition conditionnelle'.C. R. Math. Acad. Sci. Paris34152005, 313316URL: http://dx.doi.org/10.1016/j.crma.2005.07.003
 44 article'Uniform law of the logarithm for the local linear estimator of the conditional distribution function'.C. R. Math. Acad. Sci. Paris34817182010, 10151019URL: http://dx.doi.org/10.1016/j.crma.2010.08.003
 45 article'Sparse inverse covariance estimation with the graphical lasso'.Biostatistics932008, 432441
 46 article 'Graph selection with GGMselect'. Statistical applications in genetics and molecular biology 11 3 2012
 47 inproceedings'Lower Bounds for Howard's Algorithm for Finding Minimum MeanCost Cycles'.ISAAC (1)2010, 415426
 48 article'From persistent random walk to the telegraph noise'.Stoch. Dyn.1022010, 161196URL: http://dx.doi.org/10.1142/S0219493710002905
 49 incollection'Modeling subtilin production in bacillus subtilis using stochastic hybrid systems'.Hybrid Systems: Computation and ControlSpringer2004, 417431
 50 article'Multinomial modelbased formulations of TCP and NTCP for radiotherapy treatment planning'.Journal of Theoretical Biology2791June 2011, 5562URL: http://hal.inria.fr/hal00588935/en
 51 book 'Quantile regression'. 38 Cambridge university press 2005
 52 book'Statistical inference for ergodic diffusion processes'.Springer Series in StatisticsLondonSpringerVerlag London Ltd.2004, xiv+481
 53 article'Real Harmonizable Multifractional Lévy Motions'.Ann. Inst. Poincaré.4032004, 259277
 54 incollection'On the Benzecri's method for computing eigenvectors by stochastic approximation (the case of binary data)'.Compstat 1974 (Proc. Sympos. Computational Statist., Univ. Vienna, Vienna, 1974)ViennaPhysica Verlag1974, 202211
 55 inproceedings 'NonStationary Approximate Modified Policy Iteration'. ICML 2015 Lille, France July 2015
 56 book'System control and rough paths'.Oxford mathematical monographsClarendon Press2002, URL: http://books.google.com/books?id=H9fRQNIngZYC
 57 article'Highdimensional graphs and variable selection with the lasso'.The Annals of Statistics2006, 14361462
 58 article'Approximation stochastique en analyse factorielle multiple'.Ann. I.S.U.P.5032006, 2745
 59 article'Convergence d'un processus d'approximation stochastique en analyse factorielle'.Publ. Inst. Statist. Univ. Paris3811994, 3755
 60 article'Stochastic approximation of the factors of a generalized canonical correlation analysis'.Statist. Probab. Lett.78142008, 22102216URL: http://dx.doi.org/10.1016/j.spl.2008.01.088
 61 article'On nonparametric estimates of density functions and regression curves'.Theory of Probability & Its Applications1011965, 186190
 62 techreport 'The simplex method is strongly polynomial for deterministic Markov decision processes'. arXiv:1208.5083v2 2012
 63 book 'Markov Decision Processes'. Wiley, New York 1994
 64 inproceedings'Brownian penalisations related to excursion lengths, VII'.Annales de l'IHP Probabilités et statistiques4522009, 421452
 65 incollection'Elements of stochastic calculus via regularization'.Séminaire de Probabilités XL1899Lecture Notes in Math.BerlinSpringer2007, 147185URL: http://dx.doi.org/10.1007/9783540711896_7
 66 article'Stochastic calculus with respect to continuous finite quadratic variation processes'.Stochastics: An International Journal of Probability and Stochastic Processes70122000, 140
 67 inproceedings 'Approximate Policy Iteration Schemes: A Comparison'. ICML  31st International Conference on Machine Learning  2014 Pékin, China June 2014
 68 article'Approximate Modified Policy Iteration and its Application to the Game of Tetris'.Journal of Machine Learning Research16A paraître2015, 16291676
 69 article 'Improved and Generalized Upper Bounds on the Complexity of Policy Iteration'. Mathematics of Operations Research Markov decision processes ; Dynamic Programming ; Analysis of Algorithms February 2016
 70 inproceedings 'On the Use of NonStationary Policies for Stationary InfiniteHorizon Markov Decision Processes'. NIPS 2012  Neural Information Processing Systems South Lake Tahoe, United States December 2012
 71 article'Performance Bounds for Lambda Policy Iteration and Application to the Game of Tetris'.Journal of Machine Learning Research14January 2013, 11751221
 72 article'Memorybased persistence in a counting random walk process'.Phys. A.38612007, 303307URL: http://dx.doi.org/10.1016/j.physa.2007.08.027
 73 article'The range of a simple random walk on Z'.Advances in applied probability1996, 10141033
 74 misc' An introduction to network inference and mining'.(consulté le 22/07/2015)2015, URL: http://www.nathalievilla.org/doc/pdf//wikistatnetwork_compiled.pdf
 75 article'The Simplex and PolicyIteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate'.Math. Oper. Res.3642011, 593603