Section: New Results
Classification in genomics
Participants : Gilles Celeux, Cathy Maugis.
Following the Cathy Maugis thesis in 2008, we decide to use her material in collaboration with biologists of URGV (INRA, Evry Genopole) and Marie-Laure Martin-Magniette (INRA) to improve functional annotation of Arabidopsis thaliana genes. This joint work with URGV is entering in the SONATA project which will be pussued in 2010.
This year, the variable selection procedures concieved by Cathy Maugis are in particular used for genomics applications which is the result of a collaboration with researchers of of URGV (Evry Genopole). Biologists are interested in predicting the gene functions of sequenced genome organisms according to microarray transcriptome data. The microarray technology development allows one to study the whole genome in different experimental conditions. The information abundance may seem to be an advantage for the gene clustering. However, the structure of interest can often be contained in a subset of the available variables. In [17] , the variable selection algorithm SelvarClust was used to extract groups of coexpressed Arabidopsis thaliana genes. It allowed to improve the clustering and make easier the biological interpretation. In [26] , the interest of the new variable selection algorithm SelvarClustIndep for discovering the function of orphan genes is highlighted on a transcriptome dataset for the Arabidopsis thaliana plant.