Inria / Raweb 2004
Project-Team: MODBIO

Project-Team : modbio

Section: New Results

Keywords: non-coding RNA, support vector machine, pattern discovery.

Search for non-coding RNA genes

Participants: Emmanuel Gothiť, Sandrine Schermack-Peyrefitte.

While traditional genome analysis focuses on protein-encoding sequences, there is a growing demand for tools to analyse non-coding RNA. In the context of a collaboration with the UMR 7567 MAEM, we have been specially interested in small nucleolar RNAs (snoRNA), which are involved in two types of post-transcriptional modification of the ribosomal RNA. A tool to search for snoRNAs has been developed, based on the multi-class support vector machines studied in our group. Preliminary results of this approach have been reported last year [37]. Additional experiments performed this year revealed specificity problems in a number of new testcases. To overcome these problems, we propose to develop problem-specific kernel functions. A possible starting point are marginalized kernels [40], which measure the similarity of two RNA sequences taking into account their secondary structure, which is estimated by using stochastic context-free grammars (SCFG). It is indeed very important for those sequences to consider not only their primary, but also their secondary structure, which plays a crucial role for their function.