Section: New Results
Systems biology: analysing data and modeling interactions
This axis strives to build dynamical systems that model interactions implied in biological processes such as metabolism, development and differentiation, signaling, etc.. It both adresses medium scale modeling with differential equations and large scale modeling using model reduction techniques and logical constraints.
Algorithms for the analysis of large-scale models
The analysis of static and heterogenous large-scale data is the main question of integrative biology. We pushed forward an automatic reasoning approach which allows the confrontation of observations with the network topology given by interactions graphs. In collaboration with Potsdam, we use Answer Set Programming for this purpose.
Curating a Large-scale Regulatory Network by Evaluating its Consistency with Expression Datasets
Regulatory networks are generally analysed by in silico simulation of network component fluctuations under perturbations. Many difficulties have to be considered such as the incomplete state-of-art of regulatory knowledge, the large-scale of regulatory models, heterogeneity in the available data and the sometimes violated assumption that mRNA expression is correlated to protein activity. We have proposed a method using a simple consistency rule that allows large-scale network analysis using small – but reliable – expression datasets.
We have developed a BioQuali plugin for the Cytoscape environment, designed to facilitate automatic reasoning on regulatory networks. The BioQuali plugin enhances user-friendly conversions of regulatory networks (including reference databases) into signed directed graphs  . Highlighting inconsistent regions in the network or predicting which products in the network need to be up or down regulated (active or inactive) to globally explain experimental data are basic functions of the package. We tested this approach with the transcriptional network of E. coli (1763 products and 4491 interactions) extracted from the RegulonDB database  . We proved that confronting our predictions with mRNA expression experiments enables determining missing post-transcriptional interactions in the model. After correction of the model, we calculated 502 gene-expression predictions (starting from 0.5% of observations) that correspond to nearly 30% of the network products predicted to change considerably under the analysed condition. These predictions were validated with microarray outputs, obtaining an agreement of 80%. This percentage is comparable to the one obtained by other methods working on E. coli data  ,  ,  , and considerable, since we used only a transcriptional model without including metabolic regulations.
Relating inter-patient gene copy numbers variations with gene expression via gene influence networks
During tumorigenesis, genetic aberrations arise and may deeply affect the tumoral cell physiology. It has been partially demonstrated that an increase of gene copy numbers induces higher expression; but this effect is less clear for small genetic modifications. To study it, we used the Bioquali approach to perform the integration of CGH and expression data together with an influence graph derived from biological knowledge  . Interindividual variations in gene copy number and in expression allow to attack tumor varability and ultimately adresses the problem of individual-centered therapeutics. We tested this approach on Ewing tumor data. It allowed the definition of new biological hypotheses that were validated by comparison with random permutation of the initial data sets.
Knowledge based identification of essential signaling from genome-scale siRNA experiments
A systems biology interpretation of genome-scale RNA interference (RNAi) experiments is complicated by experimental variability and network signaling robustness. Over representation approaches (ORA), such as the hypergeometric or z-score, are an established statistical framework used to associate RNA interference effectors to biologically annotated gene sets or pathways. These methods, however, have known limitations: they can miss partial pathway activation, and cannot take advantage of interactome knowledge. In  we present a novel ORA, protein interaction permutation analysis (PIPA), that takes advantage of canonical pathways and established protein interactions to identify pathways enriched for protein interactions connecting RNAi hits. As a result we identify pathways and signaling hypotheses that are statistically enriched to effect cell growth in human cell lines. We used PIPA to analyze genome-scale siRNA cell growth screens performed in HeLa and TOV cell lines, showing that interacting gene pair siRNA hits are more reproducible than single gene hits. Using protein interactions, PIPA identifies enriched pathways not found using the standard Hypergeometric analysis including the FAK cytoskeletal remodeling pathway.
Construction and analysis of signalling and metabolic pathways
The previous section tackle the analysis of large-scale high-throughput static data. For the analysis of time-series data, we refined our strategy of automatic reasoning by developping abstract differential models. Our main goal is not to build full parametrized differential models (which would require a much large amount of data), but to reason over classes of models in order to understand which conclusion on active pathways can be deduced from available data. We assume that dynamics of biological networks is hierarchical, involving many separated time scales and have developed a dedicated mathematical methodology: it relies on model reduction and comparison techniques, within and between various levels of descriptions of biological networks  . We coupled this hierarchical approach together with sensitivity analysis and fitting under constraints to perform our conclusions.
Canalization of gene expression in the Drosophila blastoderm by gap gene cross regulation
In recent years, quantitative gene expression data have become available for the segment determination process in the Drosophila blastoderm, revealing a specific instance of canalization. We used a predictive dynamical model of gene regulation to study the effect of Bicoid variation on the downstream gap genes. The model correctly predicts the reduced variation of the gap gene expression patterns and allows the characterization of the canalizing mechanism. We show that the canalization is the result of specific regulatory interactions among the zygotic gap genes. We demonstrate the validity of this explanation by showing that variation is increased in embryos mutant for two gap genes, Krüppel and knirps, disproving competing proposals that canalization is due to an undiscovered morphogen, or that it does not take place at all  . In an accompanying article in PLoS Computational Biology  ,  , we show that cross regulation between the gap genes causes their expression to approach dynamical attractors, reducing initial variation and providing a robust output. These results demonstrate that the Bicoid gradient is not sufficient to produce gap gene borders having the low variance observed, and instead this low variance is generated by gap gene cross regulation.
In silico investigation of ADAM12 effect on TGF-beta receptors trafficking
The transforming growth factor beta (TGF-beta) is known to have multiple effects, including differentiation, proliferation and apoptosis. However the underlying mechanisms remain poorly understood. The regulation and effect of TGF-beta signaling is complex and highly depends on specific protein context. In collaboration with INSERM and supported by a co-tutored PhD thesis (J. Gruel)  , we have recently shown that the disintegrin and metalloproteinase ADAM12 interacts in liver with TGF-beta receptors and modulates their trafficking among membranes, a crucial point in TGF-beta signaling and development of fibrosis. In  , we aimed to better understand how ADAM12 impacts on TGF-beta receptors trafficking and TGF-beta signaling. We extracted qualitative biological observations from experimental data and defined a family of models producing a behavior compatible with the presence of ADAM12. We computationally explored the properties of this family of models which allowed us to make novel predictions (increases TGF-beta receptors internalization rate between the cell surface and the endosomal membrane, modifies TGF-beta signaling shape favoring a permanent response. Alltogether, confronting differential models with qualitative biological observations, we obtained predictions giving new insights into the role of ADAM12 in TGF-beta signaling and hepatic fibrosis process.
Regulation of fatty acid metabolism
In collaboration with laboratories of INRA and supported by a co-tutored PhD thesis (ASC Inra program, P. Blavy), we continued investigations on the regulation of fatty acids metabolism in hepatic cells. In  our purpose was to identify the hierarchy of importance amongst pathways involved in fatty acid (FA) metabolism and their regulators in the control of hepatic FA composition. A step-by-step procedure was used in which a very simple model was completed by additional pathways until the model fitted correctly the measured quantities of FA in the liver during fasting in PPAR-knockout (KO) mice and wild-type mice. The resulting model included FA uptake by the liver, FA oxidation, elongation and desaturation of FA.
From the model analysis we concluded that PPAR had a strong effect on FA oxidation. In PPAR-knockout mice, FA uptake was identified as the main pathway responsible for FA variation in the liver. The models showed that FA were oxidized at a constant and small rate, whereas desaturation of FA also occurred during fasting. The latter observation was rather unexpected, but was confirmed experimentally by the measurement of delta-6-desaturase mRNA using real-time quantitative PCR (QPCR). These results confirm that mathematical models can be a useful tool in identifying new biological hypotheses and nutritional routes in metabolism.
Metabolic Flexibility of the Mammary Gland in Lactating Dairy Cows
In 2002, Van Milgen proposed a stoichiometric model to study the metabolism of the ruminant mammary gland. It includes reactions for lactose synthesis, milk protein, fatty acids and glycerol of triglycerides. A total of 10 metabolites involved in intermediary metabolism were used to describe 92 reactions, including those yielding or using ATP, cofactors, CO2, O2, and NH3. The model was applied to data from mammary gland balance studies carried out in dairy cows. We got a complete partition of nutrients measured in the balance studies including for each pathway the ATP production or cost, CO2 production and O2 consumption. The rules applied to find the partitioning were based on the hypotheses that there is no accumulation of intermediary metabolites in the mammary gland and that there is no deficiency of ATP or co-factors. Finally, we develop a new automatic tool using Flux Balance Analysis theory, suited to analyze data arising in nutrition studies. Preliminary results on this work have been discussed in  .
Hierarchical models for complex biological systems
In order to better understand the relations between logical/discrete models and continuous models, we investigated the effect of noise and time in models.
Hybrid stochastic simplifications for multiscale gene networks
Stochastic simulation of gene networks by Markov processes has important applications in molecular biology. The complexity of exact simulation algorithms scales with the number of discrete jumps to be performed and approximate schemes using a reduced number of simulated discrete events are necessary. Also, answering important questions about the relation between network topology and intrinsic noise generation and propagation should be based on general mathematical results. We proposed a unified framework for simplification of Markov models of multiscale networks dynamics. We discuss several possible hybrid simplifications, and provide algorithms to obtain them from pure jump processes. In hybrid simplifications, some components are discrete and evolve by jumps, while other components are continuous. Hybrid simplifications are obtained by partial Kramers-Moyal expansion which is equivalent to the application of the central limit theorem to a sub-model. By averaging and variable aggregation we drastically reduce simulation time and eliminate non-critical reactions. The simplified models reproduce with good accuracy the stochastic properties of the gene networks, including waiting times in intermittence phenomena, fluctuation amplitudes and stationary distributions. Hybrid simplifications can be used for onion-like (multi-layered) approaches to multi-scale biochemical systems, in which various descriptions are used at various scales. Sets of discrete and continuous variables are treated with different methods and are coupled together in a physically justified approach  .
The effect of time parameters in discrete modeling of gene networks
Several extensions of René Thomas' asynchronous logical approach have been proposed to better fit real biological dynamical systems: components may reach different discrete expression levels, depending on the status of other components acting as activators or inhibitors. In contrast, some fine-grained propositions are modelling the evolution of chemical concentrations through differential equation systems. Hybrid paradigms try to escape oversimplifications of logical models and the inaptitude of differential models to tackle to large real networks. Particularly, time delays are introduced in logical abstractions to pass from an expression level to next. Such delays are unknown new parameters added to the model. Then hybrid model-checking techniques are used to exhibit properties about the dynamical behaviour of the network. We have described a whole pipelined process which orchestrates the following stages: model conversion from a Piece-wise Affine Differential Equation (pade ) modelization scheme into a discretized model with attractors, focus on characterized subgraphs through a graph simplification step based on probabilistic criteria, conversion of the subgraphs into Parametric Hybrid Linear Automata, inference of dynamical properties through hybrid model-checking techniques. The publication  is the outcome of a methodological investigation launched to cope with the genetic regulation network involved during Escherichia coli carbon deprivation. We retrieved a remarkable cycle already exhibited by a previous analysis of the pade .
Multiclock discrete models of biological systems
As each signal within a pathway follows its own clock, we have introduced multi-clock technique to model the dynamics of biological interactions. Discrete models do not contain any specification on the order of the transitions, which are usually defered to the simulator. We have proposed a new formalism to include timing specifications in the models. It is inspired by the formal models underlying real time programming languages such as Esterel, Lustre and Signal. In this approach, the notion of time refers the logical time used in computer science: it does not correspond to the duration of events but to their relative sequencing. This allows the description of several biological signals with different clocks, i.e., multiclock systems. One main improvement of our formalism is its capacity to support model-checking technique for properties involving biological entities and reaction time.
We validated our approach on published cell cycle models and worked on the influence of the EGF and TGF- pathways in controlling cell proliferation and consequently tumor progression in the liver. We have evaluated the robustness of hepatocellular carcinoma cell line by using data from RNA interference experiments to constraint our model. This might help identify pathway checkpoints and buffering effects between different paths of the EGF and TGF- pathways network, allowing to design news markers and new therapeutically targets for hepatocellular carcinomas  ,  ,  .