Overall Objectives
Scientific Foundations
Application Domains
New Results
Other Grants and Activities

Section: Software

Génolevures On Line: Comparative Genomics of Yeasts

Participants : David James Sherman, Pascal Durrens, Macha Nikolski, Tiphaine Martin [ correspondant ] .

The Génolevures online database ( ) provides tools and data relative to 9 complete and 10 partial genome sequences determined and manually annotated by the Génolevures Consortium, to facilitate comparative genomic studies of hemiascomycetous yeasts. With their relatively small and compact genomes, yeasts offer a unique opportunity for exploring eukaryotic genome evolution. The new version of the Génolevures database provides truly complete (subtelomere to subtelomere) chromosome sequences, 48 000 protein-coding and tRNA genes, and in silico analyses for each gene element. A new feature of the database is a novel collection of conserved multi-species protein families and their mapping to metabolic pathways, coupled with an advanced search feature. Data are presented with a focus on relations between genes and genomes: conservation of genes and gene families, speciation, chromosomal reorganization and synteny. The Génolevures site includes an area for specific studies by members of its international community.

The focus of the Génolevures database is to describe the relations between genes and genomes. We curate relations of orthology and paralogy between genes, as individuals or as members of protein families, chromosomal map reorganization and gain and loss of genes and functions. We do not provide detailed annotations of individual genes and proteins of S. cerevisiae which are already carefully maintained by the MIPS in the CYGD database ( [48] in Europe and by the SGD ( ) [32] in North America, as well as in general-purpose databases such as UniProtKB [30] and EMBL [43] .

While extensive chromosomal rearrangements combined with segmental and massive duplications make comparisons of yeast genome sequences difficult [54] , relations of homology between protein-coding genes can be identified despite their great diversity at the molecular level [36] . Families of homologous proteins provide a powerful tool for appreciating conservation, gain and loss of function within yeast genomes. Génolevures provides a unique collection of paralogous and orthologous protein families, identified using a novel consensus clustering algorithm [49] applied to a complementary set of homeomorphic [sharing full-length sequence similarity and similar domain architectures, see [60] ] and nonhomeomorphic systematic Smith-Waterman [53] and Blast [29] sequence alignments. Similar approaches are developed on a wider scale [60] and are complementary to these yeast-specific families.

The Génolevures database uses a straightforward object model mapped to a relational database. Flexibility in the design is guaranteed through the use of ontologies and controlled vocabularies: the Sequence Ontology [38] for DNA sequence features and GLO, our own ontology for comparative genomics (D. Sherman, unpublished data). Browsing of genomic maps and sequence features is provided by the Generic Genome Browser [58] . The Blast service is provided by NCBI Blast 2.2.6 [29] . The Génolevures web site uses a REST architecture internally [39] and extensively uses the BioPerl package [57] for manipulation of sequence data.

See also the web page .


Logo Inria