Protea – coding sequence identification
Protea is a software for identifying evolutionary conserved coding sequences using a comparative analysis of genomic sequences. The rationale behind our method is that protein coding DNA sequences should feature mutations that are consistent with the genetic code and that tend to preserve the function of the translated amino acid sequence. The algorithm takes advantage of a specific substitution pattern of coding sequences together with the consistency of reading frames showing the best sequence similarity. This idea is original, and provides a complementary point of view to most of gene finders that are based on sequence composition bias or homology. The implementation uses graph-theoretical models to combine pairwise alignments and estimates the significancy of the conservation of the reading frames. This work appeared in  . Protea is distributed under the Cecill license.