Section: Scientific Foundations
Keywords : Networks, evolution, dynamical systems, functional annotation, motifs, search, inference, probabilistic modelling, data analysis, graph algorithms, combinatorics, knowledge bases.
Participants : Vicente Acuña, Bruno Besson, Frédéric Boyer, Eric Coissac, Ludovic Cottret, Marc Deloger, Hidde de Jong, Samuel Druhle, Estelle Dumas, Laurent Duret, Samuel Druhle, Christian Gautier, Philippe Genoud, Manolo Gouy, Laurent Guéguen, Sophie Huet, Daniel Kahn, Corinne Lachaize, Vincent Lacroix, Claire Lemaitre, Pedro Monteiro, Anne Morgat, Dominique Mouchiroud, Michel Page, Guy Perrière, Emmanuel Prestat, François Rechenmann, Adrien Richard, Delphine Ropers, Marie-France Sagot, Paulo Gustavo Soares da Fonseca, Eric Tannier, Raquel Tavares, Patricia Thébault, Jean Thioulouse, Alain Viari.
Functional genomics refers to arriving at an understanding of the different features of a genome such as genes, non-coding RNAs etc. This requires in general understanding how such features are related to one another, that is understanding the network of relations holding among the different elements of the genomic landscape, and between genomes and their cellular and extra-cellular environment.
Computationally speaking, funtional genomics requires therefore expertise in particular with graph theory and algorithmics (with tree algorithmics as a special case), but also with dynamic systems and, as for comparative genomics, with general data analysis methods (of proteomic, transcriptomic and other ``omic'' data), knowledge representation, and combinatorics (concerning random graph models more specially). Again, these are expertises well covered within HELIX. Functional genomics requires further good visualisation tools for which HELIX built solid collaborations with outside experts.
Computational proteomics and transcriptomics
By analogy with the term genomics, referring to the systematic study of genes, proteomics is concerned with the systematic study of proteins. More particularly, proteomics aims at identifying the set of proteins expressed in a cell at a given time under given conditions, the so-called proteome. Recent progress in mass spectrometry (MS) has resulted in efficient techniques for the large-scale analysis of proteomes. In particular, the MS/MS technique allows for the determination of complete or partial sequences of proteins from their fragmentation patterns. State-of-the-art mass spectrometers produce large volumes of data the interpretation of which can no longer be carried out manually. In fact, there is a growing need for computer tools allowing for a fully automated protein identification from raw MS/MS data. This has motivated a collaboration between HELIX and the ``Laboratoire de Chimie des Proteines'' (LCP) at the CEA in Grenoble. The aim of the collaboration is to develop computer tools for the analysis of data produced by the MS/MS approach. In particular, efficient algorithms have been designed for generating partial sequence (Peptide Sequence Tags, PST) MS/MS spectra, for scanning protein databases in search of sequences matching these PSTs, and for mapping the PSTs on the complete translated genome sequence of an organism. These algorithms have been implemented in a high-throughput software pipeline installed at the LCP in order to provide support to the Genopole proteomic platform.
The dynamic link between genome, proteome and cellular phenotype is formed by the subset of genes transcribed in a given organism, the so-called transcriptome. The regulation of gene expression is the key process for adaptation to changes in environmental conditions, and thus for survival. Transcriptomics describes this process at the scale of an entire genome. There are two main strategies for transcriptome analysis: i) direct sampling (and quantification) of sequences from source RNA populations or cDNA libraries (the most common techniques of this type are ESTs and SAGE) and ii) hybridization analysis with comprehensive non-redundant collections of DNA sequences immobilised on a solid support (the methods most often used in this case are DNA macroarrays, microarrays, and chips). Members of the HELIX project have worked with SAGE, EST and DNA microarray data in particular, to analyse the transcription pattern of transposable elements, improve the inference of sequence motifs and work towards an automatic inference method of small genetic networks, and provide initial links between genetic information and metabolism (and therefore between genotype and phenotype where by genotype one undertands the specific genetic makeup – the specific genome – of an individual, and by phenotype either an individual's total physical appearance and constitution or a specific manifestation of a trait, such as size, eye color, or behaviour that varies between individuals).
Modelling and analysis of metabolism: molecular components, regulation, and pathways
Beyond genomic, proteomic and transcriptomic data, a large amount of information is now available on the molecular basis of cellular processes. Such data are quite heterogeneous, including among other things the organisation of a genome into operons and their regulation, and the chemical transformations occurring in the cell (together with their metabolites). The challenge of biology today is to relate and integrate the various types of data so as to answer questions involving the different levels of structural, functional, and spatial organisation of a cell. The data gathered over the past few decades are usually dispersed in the literature and are therefore difficult to exploit for answering precise questions. A major contribution of bioinformatics is therefore the development of databases and knowledge bases allowing biologists to represent, store, and access data. The integration of the information in the different bases requires explicit, formal models of the molecular components of the cell and their organisation. HELIX is involved in the development of such models and their implementation in object-oriented or relational systems. The contribution of HELIX to this field is twofold: on one hand some HELIX members are interested in the development of knowledge representation systems, whereas other members are interested in putting these systems to work on biological data. In this context, HELIX collaborates tightly with the SwissProt group at SIB in order to set up a database of metabolic pathways (UniPathway).
Another aspect of the activity of HELIX in this field concerns the design of algorithms to reconstruct and analyse metabolic pathways. By contrast to homology-based approaches, we try to tackle the problem of reconstruction in an ab-initio fashion. Given a set of biochemical reactions together with their substrates and products, the reactions are considered as transfers of atoms between the chemical compounds. The basic idea is to look for sequences of reactions transferring a maximal (or preset) number of atoms between a given source compound and the sink compound.
In the same vein, several related problems (for instance, comparing biochemical networks to genomic organisation) have been put in the form of a graph-theoretical problem (such as finding common connected components in multigraphs) in order to provide a uniform formalisation. This activity in graph theory applied to biological problems is now conducted in a collaboration between Grenoble and Lyon, in particular through the question of searching and inferring modules in metabolic networks by defining ``connected subgraph motifs''. Beyond practical applications, this raises interesting and difficult questions in combinatorics and statistics. The combinatoric aspects are adressed in collaboration with the University of São Paulo, Brazil and the statistical aspects are studied in collaboration with Sophie Schbath (INRA, Jouy-en-Josas) and Stéphane Robin (InaPG, Paris).
A simple graph model may be enough to conceive and to apply methods such as the search or inference of motifs but meets its limit as soon as one wishes to push further the analysis of the results obtained. A natural extension consists in representing a metabolic network with an hypergraph instead, which allows to capture in a more realistic way the links between the different metabolites, and therefore to detect finer structural properties. Furthermore, performing structural analyses using such representation enables an interesting parallel with other methods for analysing metabolic networks that are based on a decomposition of the stoichiometric matrix (constraint-based model). A stoichiometric matrix indicates the proportion of each metabolite that participates in a reaction as input or output. HELIX has started working with this hypergraph representation, and with the question of enumerating elementary modes and minimal reaction cuts in a network. An elementary mode may be seen as a set of reactions that, when used together, perform a given task while a minimal reaction cut set is a set of reactions one needs to inhibit to prevent a given task, also called target reaction , from being performed. This work is done in collaboration with Alberto Marchetti-Spaccamela from the University of Rome, Italy, and Leen Stougie from the Eindhoven University of Technology and the CWI at Amsterdam, Netherlands.
Modelling and simulation of genetic regulatory networks
All the aforementioned research topics concern, in some way, ``static'' data (i.e. the description of the cellular actors, together with their interactions). Except for evolution (but on a very different time-scale), time is not taken explicitely into account. To achieve a better understanding of the functioning of an organism, the networks of interactions involved in gene regulation, metabolism, signal transduction, and other cellular and intercellular processes need to be represented and analyzed within a dynamical perspective.
Genetic regulatory networks control the spatiotemporal expression of genes in an organism, and thus underlie complex processes like cell differentiation and development. They consist of genes, proteins, small molecules, and their mutual interactions. From the experimental point of view, the study of genetic regulatory networks has taken a qualitative leap through the use of modern genomic techniques that allow simultaneous measurement of the expression of all genes of an organism such as the above-mentioned transcriptomics techniques. However, in addition to these experimental tools, mathematical methods supported by computer tools are indispensable for the analysis of genetic regulatory networks. As most networks of interest involve many genes connected through interlocking positive and negative feedback loops, it is difficult to gain an intuitive understanding of their dynamics. Modelling and simulation tools allow the behaviour of large and complex systems to be predicted in a systematic way.
A variety of methods for the modelling and simulation of genetic regulatory networks have been proposed, such as approaches based on differential equations and stochastic master equations. These models provide detailed descriptions of genetic regulatory networks, down to the molecular level. In addition, they can be used to make precise, numerical predictions of the behaviour of regulatory systems. Many excellent examples of the application of these methods to prokaryote and eukaryote networks can be found in the literature. In many situations of biological interest, however, the application of the above models is seriously hampered. In the first place, the biochemical reaction mechanisms underlying regulatory interactions are usually not or incompletely known. In the second place, quantitative information on kinetic parameters and molecular concentrations is only seldom available, even in the case of well-studied model systems.
The aim of the research being carried out in HELIX is to develop methods for the modelling and simulation of genetic regulatory networks that are capable of dealing with the current lack of detailed, quantitative data. In particular, a method for the qualitative simulation of genetic regulatory networks has been developed and implemented in the computer tool Genetic Network Analyzer (GNA ). The method and the tool have been applied to the analysis of prokaryote regulatory networks in collaboration with experimental biologists at the Université Joseph Fourier (Grenoble) while several other groups have used GNA for similar purposes. Recently, the scope of the reseach has been enlarged to the validation and identification of models of genetic regulatory networks.