Inria / Raweb 2004
Project-Team: MODBIO

Search in Activity Report, year 2004:


Project-Team : modbio

Section: Application Domains

Molecular biology

Participants: Ernst Althaus, Alexander Bockmayr, Stefan Canzar, Arnaud Courtois, Yannick Darcy, Eric Domenjoud, Damien Eveillard, Emmanuel Gothié, Yann Guermeur, Abdelhalim Larhlimi, Sandrine Schermack-Peyrefitte, Frédéric Sur, Myriam Vezain.

Molecular biology is concerned with the study of three types of biological macromolecules: DNA, RNA, and proteins. Each of these molecules can initially be viewed as a string on a finite alphabet: DNA and RNA are nucleic acids made up of nucleotides A,C,G,T and A,C,G,U, respectively. Proteins are sequences of amino acids, which may be represented by an alphabet of 20 letters.

Molecular biology studies the information flow from DNA to RNA, and from RNA to proteins. In a first step, called transcription, a DNA string (``gene'') is transcribed into messenger RNA (mRNA). In the second step, called translation, the mRNA is translated into a protein, where each triplet of nucleotides encodes one amino acid (``genetic code''). During transcription, an intermediate maturation step can occur, which happens mainly in eukaryotic cells. In the so-called splicing process, introns are removed from the premessenger RNA. The remaining exons are concatenated yielding the mature RNA molecule.

Biological macromolecules are not just sequences of nucleotides or amino acids. Actually, they are complex three-dimensional objects. DNA shows the famous double-helix structure. RNA and proteins fold into complex three-dimensional structures, which depend on the underlying sequence. RNA is a single-stranded chain of nucleotides. However, a nucleotide in one part of the molecule can base-pair with a nucleotide in another part, following the Watson-Crick complementarity rules. This results in a folding of the molecule. The secondary structure of RNA indicates the set of base pairings in the three dimensional structure of the molecule. This information can be represented by a graph.

Proteins have several levels of structure. Above the primary sequence is the secondary structure, which involves three basic types: $ \alpha$-helices, $ \beta$-sheets, and structure elements that are neither helices nor sheets, called loops. A domain of a protein is a combination of secondary structure elements with some specific function. It contains an active site where an interaction with an external molecule may happen. A protein may have one or several domains.

The ultimate goal of molecular biology is to understand the function of biological macromolecules in the life of the cell. Function results from the interaction between different macromolecules, and depends on their structure. The overall challenge is to make the leap from sequence, through structure, to understand about the function.