Inria / Raweb 2004
Project-Team: MODBIO

Search in Activity Report, year 2004:


Project-Team : modbio

Section: New Results

Keywords: Statistical learning theory, support vector machine, SELEX, pattern discovery.

SELEX data processing

Participants: Stéphanie Bonne-Billaut, Damien Eveillard, Abdelhalim Larhlimi, Sandrine Schermack-Peyrefitte.

Nucleic acid-protein interactions play an important role in the cell. Recent work shows the importance of nucleic motifs in these interactions. SELEX experiments [46] can automatically characterize the potential ligands for a given target protein, starting from a random oligonucleotidic database. As shown in [11], processing SELEX data is a non-trivial task. The biological motifs generally cannot be directly identified from the experimental database. In particular, this holds for the binding sites of SR proteins, a protein family that is important in the regulation of the alternative splicing process, see Sect.  6.7.

A new method to localise a protein binding motif, based on statistical learning, has been developed in the team. We optimised a kernel method (M-SVM), dedicated to the recognition of SR motifs. Our machine was trained on experimental SELEX data. To analyse the M-SVM results biologically, the graphical interface KOALAB (see Sect.  5.2) was developed. Using data analysis in addition to the graphics interpretation, we can now predict SR binding sites in the HIV-1 genome. A complete analysis of the M-SVM results for two SR proteins, SC35 and 9G8, has been performed [19]. The study concentrated on well documented splicing regulatory sites in the HIV-1 genome, the A2, A3 and A7 acceptor sites, in order to validate the approach with a maximum of experimental data. We also compared our method to the classical global consensus approach using the grappe tool, which is also present in KOALAB, as well as the ESEfinder software [30], which is based on Hidden Markov Models (HMM). Our results show that the M-SVM gives the best results for SC35, in selectivity and specificity, compared to the other available methods. It suggests some potential sites for 9G8. These have to be tested experimentally, since only poor experimental results for this protein are currently available.