Section: Other Grants and Activities
ANR Defis Codex (2009-2011): Efficiency, Dynamicity and Composition for XML Models, Algorithms, and Systems
The Codex project seeks to push the frontier of XML technology in three interconnected directions. First, efficient algorithms and prototypes for massively distributed XML repositories are studied. Second, models are developed for describing, controlling, and reacting to the dynamic behavior of XML collections and XML schemas with time. Third, methods and prototypes are developed for composing XML programs for richer interactions, and XML schemas into rich, expressive, yet formally grounded type descriptions.
Coordinated by Manolescu (GEMO, INRIA Saclay), with Geneves (WAM, INRIA Grenoble), Colazzo (LRI, Orsay), Castagna (PPS, Paris 7), and Halfeld (Blois).
ANR Blanc Enumeration (2007-2010): Complexity and Algorithms for Answer Enumeration
We propose to study algorithmic and complexity questions of answers enumeration, the task of generating all solutions of a given problem. Answer enumeration requires innovative efficient algorithms that can quickly serve large numbers of answers on demand. The prime application is query answering in databases, where huge answer sets arise naturally.
Mostrare proposes to contribute answer enumeration algorithms for XML database queries. We want to distinguish classes of XQuery transformations that allow for efficient enumeration algorithms. We start from tractable fragments of XPath dialects with variables, and from n-ary queries defined by tree automata.
Our partners are: Arnaud Durand (coordinator - Paris VII ), Etienne Grandjean ( Caen ), Nadia Creignou ( Marseille ). 2008–2010. More information about the project can be found on http://enumeration.gforge.inria.fr .
ARA MDCO CROTAL (2008-2009): Conditional Random Fields for Natural Language Processing
The CRoTal project aims at exploring and developing new techniques to access huge textual banks. The project will especially focus on an innovative technique : Conditional Random Fields ( CRF ), a family of graphical models developed for computational linguistic applications. CRFs allow to annotate data from examples of annotated data. They are at the state of the art level in many domains, including extracting and structuring knowledge. But they also require refinements and optimisation to be efficiently applied to large datasets, or to structured data. More precisely, our aims are twofold: first, develop new algorithms to process large amount of data; second, apply these algorithms to texts and tree-banks, so that we are able to annotate, extract knowledge and fill knowledge banks from texts. The general purpose is to enrich textual data by learning to annotate them. We plan to work both on English and French corpora.
Mostrare proposes to use CRF for trees and to apply them to corpora by experienced teams in the field of Natural Language Processing.
The coordinator of the project is I. Tellier . Our partners are: R. Marin , A. Balvet (linguistics, Lille3), T. Poibeau , A. Rozenknopf (Paris 13), F. Yvon (Limsi-CNRS, Paris 11). 2008-2009. More information about the project can be found on http://crotal.gforge.inria.fr/pmwiki-2.1.27/ .
ANR Jeune BioSpace (2008-2010): A Uniform Approach for Stochastic Modeling with Spatial Aspects in Systems Biology
Participant : Joachim Niehren [ correspondent ] .
Stochastic modeling and simulation seeks to improve the understanding of genetic networks in systems biology. BioSpace proposes to develop, design, and implement a novel and generic stochastic modeling language that is able to cope with all kinds of spatial phenomena in molecular networks with space-dependent concurrent control. We hope to find a unifying framework accessible to biologists, that extends on existing rule based approaches while providing for compartments with variable volumes in particular. The development of our new language will be accompanied by its application to cellular biology. Our modeling studies will focus on spatial aspects in eukaryotic gene regulation: positioning of chromosomes in the nucleus, establishment and maintenance of nuclear compartments, and cross-talk between chromosomes. This project is leaded by Cédric Lhoussaine from the BioComputing activity at the lifl in Lille.
ARA MDCO Marmota (2006-2008): Stochastic Tree Models and Stochastic Tree Transformations
We propose to study computational issues at the intersection of three domains: formal tree languages, machine learning and probabilistic models. Our study is mainly motivated by XML data manipulation: data integration on the Internet from heterogeneous and distributed sources; XML annotation and transformation; XML document classification and clustering. However, fundamental intended results have an important impact in many application domains. For instance, in bioinformatics and music retrieval, it is actually relevant to model data by using probabilistic trees. Therefore, this project is also concerned with the specific problems of these two applications domains and we will use large data sets of these areas. We will consider generative models for tree structured data, non generative models for tree structured data, and models for probabilistic tree pattern matching and probabilistic tree transformations: tree pattern matching algorithms, learning pattern languages, induction of tree transformations.
The coordinator of the project is M. Tommasi . Our partners are: P. Gallinari ( lip 6), F. Denis ( lif ), and M. Sebban ( Saint Etienne ). 2006–2008. More information about the project can be found on http://marmota.gforge.inria.fr/ .