Section: Other Grants and Activities
ANR Lampada (2009-2013)
Participants : Édouard Gilbert, Rémi Gilleron, Aurélien Lemay, Marc Tommasi [ correspondent ] , Fabien Torre.
The Lampada project on “Learning Algorithms, Models and sPArse representations for structured DAta” is coordinated by M. Tommasi from Mostrare. Our partners are the Sequel project of Inria Lille Nord Europe, the Lif (Marseille), the Hubert Curien laboratory (Saint-Etienne), and lip 6 (Paris). More information on the project can be found on http://lampada.gforge.inria.fr/ .
Lampada is a fundamental research project on machine learning and structured data. It focuses on scaling learning algorithms to handle large sets of complex data. The main challenges are 1) high dimension learning problems, 2) large sets of data and 3) dynamics of data. Complex data we consider are evolving and composed of parts in some relations. Representations of these data embed both structure and content information and are typically large sequences, trees and graphs. The main application domains are web2, social networks and biological data.
The project proposes to study formal representations of such data together with incremental or sequential machine learning methods and similarity learning methods.
The representation research topic includes condensed data representation, sampling, prototype selection and representation of streams of data. Machine learning methods include edit distance learning, reinforcement learning and incremental methods, density estimation of structured data and learning on streams.
ANR Defis Codex (2009-2011)
Participants : Joachim Niehren [ correspondent ] , Sławek Staworko, Aurélien Lemay, Sophie Tison, Anne-Cécile Caron, Olivier Gauwin, Jérôme Champavère.
The Codex project on “Efficiency, Dynamicity and Composition for XML Models, Algorithms, and Systems” and is coordinated by Manolescu (GEMO, INRIA Saclay). The other partners of Mostrare there are Geneves (WAM, INRIA Grenoble), Colazzo (LRI, Orsay), Castagna (PPS, Paris 7), and Halfeld (Blois).
The Codex project seeks to push the frontier of XML technology in three interconnected directions. First, efficient algorithms and prototypes for massively distributed XML repositories are studied. Second, models are developed for describing, controlling, and reacting to the dynamic behavior of XML collections and XML schemas with time. Third, methods and prototypes are developed for composing XML programs for richer interactions, and XML schemas into rich, expressive, yet formally grounded type descriptions.
The PhD project of Gauwin (directed by Niehren and Tison) as described above, contributes to the Mostrare part of Codex on XML streaming. Mostrare also contributes to learning algorithms for XML transformations as needed for schema adaption. These projects are studied in the PhD projects of J. Champavère (directed by Lemay, Niehren, and Gilleron) and G. Laurence (directed by Tommasi, Staworko and Niehren). In this context, we succeeded to hire our previous postdoc S. Staworko as an assistent professor in 2009 by the University of Lille 3. See above for the contributions by these project members on this topic.
ANR Blanc Enum (2007-2010)
Participants : Guillaume Bagan, Olivier Gauwin, Joachim Niehren [ correspondent ] , Sophie Tison.
The Enum project on “Complexity and Algorithms for Answer Enumeration” is coordinated by A. Durand (Paris VII ). The other partners are E. Grandjean (Caen ), N. Creignou (Marseille ). 2008–2010. More information about the project can be found on http://enumeration.gforge.inria.fr . Enum studies algorithmic and complexity questions of answers enumeration, the task of generating all solutions of a given problem. Answer enumeration requires innovative efficient algorithms that can quickly serve large numbers of answers on demand. The prime application is query answering in databases, where huge answer sets arise naturally.
Mostrare proposes to contribute answer enumeration algorithms for XML database queries. We want to distinguish classes of XQuery transformations that allow for efficient enumeration algorithms. We start from tractable fragments of XPath dialects with variables, and from n-ary queries defined by tree automata.
Gauwin is working on enumeration algorithms for conjunctive queries in cooperation with A. Durand (Paris 7) and F. Filiot (Brussels). In 2009, we succeeded to hire G.Bagan from Caen as postdoc, in order to work on efficient answer enumeration for queries defined in Conditional XPath with variables. The topic was prepared by a internship of A. Venant from ENS Cachan-Bretagne supervised by Niehren.
ARA MDCO Marmota (2006-2009)
Participants : Rémi Gilleron, Aurélien Lemay, Joachim Niehren, Marc Tommasi [ correspondent ] .
The Marmota project on “Stochastic Tree Models and Stochastic Tree Transformations” is coordinated by M. Tommasi from Mostrare. Our partners are: P. Gallinari (lip 6), F. Denis (lif ), and M. Sebban (Saint Etienne ). 2006–2008. More information about the project can be found on http://marmota.gforge.inria.fr/ .
Marmota proposes to study computational issues at the intersection of three domains: formal tree languages, machine learning and probabilistic models. Our study is mainly motivated by XML data manipulation: data integration on the Internet from heterogeneous and distributed sources; XML annotation and transformation; XML document classification and clustering. However, fundamental intended results have an important impact in many application domains. For instance, in bioinformatics and music retrieval, it is actually relevant to model data by using probabilistic trees. Therefore, this project is also concerned with the specific problems of these two applications domains and we will use large data sets of these areas. We will consider generative models for tree structured data, non generative models for tree structured data, and models for probabilistic tree pattern matching and probabilistic tree transformations: tree pattern matching algorithms, learning pattern languages, induction of tree transformations.
ARA MDCO CROTAL (2008-2009)
Participants : Rémi Gilleron, Marc Tommasi.
The CRoTal project on “Conditional Random Fields for Natural Language Processing” is coordinated by I. Tellier , a previous member of Mostrare (until 2008). Our partners are R. Marin , A. Balvet (linguistics, Lille3), T. Poibeau , A. Rozenknopf (Paris 13), F. Yvon (Limsi-CNRS, Paris 11). 2008-2009. More information about the project can be found on http://crotal.gforge.inria.fr/pmwiki-2.1.27/ .
Crotal aims at exploring and developing new techniques to access huge textual banks. The project will especially focus on an innovative technique : Conditional Random Fields (CRF ), a family of graphical models developed for computational linguistic applications. CRFs allow to annotate data from examples of annotated data. They are at the state of the art level in many domains, including extracting and structuring knowledge. But they also require refinements and optimization to be efficiently applied to large datasets, or to structured data. Our aims are twofold: first, develop new algorithms to process large amount of data; second, apply these algorithms to texts and tree-banks, so that we are able to annotate, extract knowledge and fill knowledge banks from texts. The general purpose is to enrich textual data by learning to annotate them. We plan to work both on English and French corpora.
Mostrare proposes to use CRF for trees and to apply them to corpora by experienced teams in the field of Natural Language Processing.