Overall Objectives
Scientific Foundations
New Results
Contracts and Grants with Industry
Other Grants and Activities

Section: Scientific Foundations

RNA and protein structures

Most problems in computational biology are NP-hard as soon as all known and reasonable biological information is taken into account. For instance, structural biology is concerned with 3D structures of complex molecules. Prediction, comparison and design are, in fact, three optimisation problems where these structures are classically represented by graphs and they are known to be NP-complete. A fruitful strategy consists in designing models that maintain the biological relevance while being simple enough to be computationnally tractable. The representation chosen determines the data structures and algorithms classes to be used. The challenge is to develop formal models, along with efficient algorithms, or heuristics, to deal with them. The various biological problems described above raise different computer science issues. To tackle them, the project members rely on a common methodology for which our group has a significant experience. Indeed, many of them can be expressed with classical combinatorial objects such as graphs, trees, words and grammars.


Participants : Patrick Amar, Alain Denise, Thomas Moncion, Yann Ponty, Balaji Raman, Mireille Régnier, Cédric Saule, Jean-Marc Steyaert.

Common activity with P. Clote (Boston College and Digiteo).

Recoding events and riboswitches

Recoding represents several non conventional phenomena for the translation of messenger RNA (mRNA) into proteins, including frameshift, readthrough, hopping, where a single mRNA sequence allows the synthesis of (at least) two different polypeptides. Recoding is mandatory for many virus machinery and viability. We develop two complementary computational methods that aim to find genes subject to recoding events in genomes. The first one is based on a model for the recoding site ; the second one is based on a comparative genomics approach at a large scale. In both cases, our predictions are subject to experimental biological validation by our collaborators at Igm (Institut de Génétique et Microbiologie), Paris-Sud University. This work is funded by the ANR (project RNA-RECOD , ANR BLANC 2006-2010). Additionnally, we are currently developing a combinatorial approach, based on random generation, to design small and structured RNAs. Our goal is to build these RNAs such that their hybridization with existing mRNAs will be favorable to independent folding, and will therefore affect the stability of some secondary structures involved in recoding events. An application of such a methodology to the Gag-Pol HIV-1 frameshifting site will be carried out with our collaborators at Igm . We hope that, upon capturing the hybridization energy at the design stage, one will be able to gain control over the rate of frameshift and consequently fine-tune the expression of Gag/Pol .

It has also been observed, mainly on bacteria, that some mRNA sequences may adopt an alternate fold. Such an event is called a riboswitch . A common feature of recoding events or riboswitches is that some structural elements on mRNA initiate unusual action of the ribosome or allow for an alternate fold under some environmental conditions. One challenge is to predict genes that might be subject to riboswitches.

Another mid-term challenge is the design of molecules that enhance or repress such events.

Structural tertiary motifs

Single strand RNA folds to a stable and compact structure. This folding leads to a secondary structure that is an intermediate structure level for RNA, between the single sequence and the full structure (tertiary structure). It is based on pairing between complementary bases (A-U and C-G). A recent classification , the Leontis-Westhof classification , distinguishes twelve different kinds of chemical bonds between two nucleotides, according to the way they are linked together within the tertiary structure. Other kinds of interactions are also taken into account, such as stacking , and phosphodiester bonds along the sequence. This knowledge turns out to be crucial to determine molecular stability. Moreover, some recent works on RNA biochemistry have shown that RNA molecules are structured by RNA tertiary motifs . These motifs, that are known from 3D structure, can be seen as “small bricks” that play a very important role in RNA structuration. Indeed, it was shown that taking these motifs into account can lead to improve significantly the 3D prediction methods. We develop graph algorithms for extracting tertiary motifs from RNA structures, and for predicting the tertiary structure from the sequence [2] . This project, in collaboration with two groups from University of Strasbourg and University of Versailles, is funded by the ANR (project AMIS-ARN , ANR BLANC 2009-2012).


Participants : Jérôme Azé, Julie Bernauer, Thomas Bourquard, Thomas Simonson, Thuong Van Du Tran.

Docking and evolutionary algorithms

The function of many proteins depends on their interaction with one or many partners. Despite the improvements due to structural genomics initiatives, the experimental solving of complex structures remains a difficult problem. The prediction of complexes, docking , proceeds in two steps: a configuration generation phase or exploration and an evaluation phase or scoring . As the verification of a predicted conformation is time consuming and very expensive, it is a real challenge to reduce the time dedicated to the analysis of complexes by the biologists. In a collaboration with A. Poupon, Inra -Tours, a method that sorts the various potential conformations by decreasing probability of being real complexes has been developed. It relies on a ranking function that is learnt by an evolutionary algorithm. The learning data are given by a geometric modelling of each conformation obtained by the docking algorithm proposed by the biologists. Objective tests are needed for such predictive approaches. The Critical Assessment of Predicted Interaction , Capri , a community wide experiment modelled after Casp was set up in 2001 to achieve this goal ( ). First results achieved for Capri'02 suggested that it is possible to find good conformations by using geometric information for complexes. This approach has been followed (see section New results). As this new algorithm will produce a huge amount of conformations, an adaptation of the ranking function learning step is needed to handle them.

Computational Protein Design

A protein amino acid sequence determines its structure and biological function, but no concise and systematic set of rules has been stated up to now to describe the functions associated to a sequence; experimental methods are time (and money) consuming. Massive genome sequencing has revealed the sequences of millions of proteins, whereas roughly 55.000 3D protein structures, only, are known yet. Structure prediction in silico attempts to fill up the gap. It consists in finding a tentative spatial (3D) conformation that a given nucleotidic or aminoacid sequence is likely to adopt. A second problem of interest is inverse protein folding or computational protein design (CPD), that is the prediction of amino-acid sequences that adopt a particular target tertiary structure. This problem has many implications such as protein folding and stability, structure prediction (fold recognition), or protein evolution. Moreover, it is a mandatory step towards the design of new, artificial proteins. The engineering of protein-ligand interactions also has great biological and technological value. For example, the recent engineering of aminoacyl-tRNA synthetase (aaRS) enzymes has led to organisms with a modified genetic code, expanded to include nonnatural aminoacids.

Molecular dynamics (MD) simulations use numerical methods to study the motion of atoms, by far too complex for analytical studies. They were used by Bioc for extensive computational engineering of aaRS, aminoacyl-tRNA synthetases. For computational protein design, and structure prediction as well, a possible modelling considers the protein backbone and sidechains . This backbone structure may be known by high-resolution methods. High-quality models for sidechains interactions with solvent have been designed. There is a finite number of possible positions for sidechains, that may be memorized in a rotamer library. A fitness or energy function that relies on atomistic and physical-chemical criteria is associated to each conformation. Therefore, one may search the set of possible sequences to optimize stability criteria.

Another novel ingredient is the use of negative design : the ability to select against sequences that have undesired properties, such as a tendency to fold into alternate, undesired structures. It can be critical for attaining specificity when competing states are close in (stability) structure space. There are also current efforts to enlarge this thermodynamical point of view by a new knowledge on natural proteins with known conformations.

Transmembrane proteins

Our goal is to predict the structure of different classes of barrel proteins . Those proteins contain the two large classes of transmembrane proteins, which carry out important functions. Nevertheless, their structure is yet difficult to determine by standard experimental methods such as X-ray cristallography or NMR. Most existing methods only address single-domain protein structures. Therefore, for large proteins, a preprocessing to determine the protein domains is necessary. Then, a suitable model of energy functions needs to be designed for each specific class. We have designed a pseudo-energy minimization method for the prediction of the super-secondary structure of $ \beta$ -barrel or $ \alpha$ -helical-barrel proteins with structural knowledge-based enhancement. The method relies on graph based modelling and also deals with various topological constraints such as Greek key or Jelly roll conformations.


Logo Inria