National Initiatives

ANR project PASSAGE (2006 – 2008)

Participants : Éric Villemonte de La Clergerie, Benoît Sagot, Pierre Boullier, François Guérin, Caroline Benoît, Marie-Laure Guénot.

PASSAGE Homepage:

EASy homepage:

PASSAGE is an action in ANR MDCA program (Masse de Données Connaissance Ambiantes ) started in 2007 and extended till mid 2010. The participants are Alpage (coordinator), LIR (LIMSI, Orsay), “Langue & Dialogue” (LORIA, Nancy), LI2CM (CEA-LIST), plus several contractors (ELDA, TAGMATICA and several providers of parsing systems).

PASSAGE stands for “Large Scale Production of Syntactic Annotations to move forward ” . Its main objectives are to parse a large corpus (100 to 200 million words) with several parsers (around 10 systems), combine the results provided by these parsers and use the resulting annotations to acquire new linguistic knowledge (semantic classes, subcategorization frames, disambiguation probabilities, ...). A small part of the corpus (around 400000 words) will be manually validated to be used as a reference treebank. Two evaluation campaigns based on the work done during the Technolangue action EASy will be conducted during PASSAGE to assess the performances of the parsing systems. The annotations and derived linguistic resources will be made available.

This year is essentially the participation of two ALPAGE parsers to the 2nd parsing evaluation campaign organized by PASSAGE (Fall 2009).

ANR project Sequoia (2009 – 2011)

Participants : Benoît Sagot, Pierre Boullier, Marie Candito, Benoit Crabbé, Pascal Denis, Éric Villemonte de La Clergerie, Djamé Seddah.

Alpage plays a major role in the ANR-funded project Sequoia , lead by Alexis Nasr (LIF, University of Marseille-Provence, former member of the Talana team at University Paris 7). This project aims at developing or adapting probabilistic parsing techniques in order to release a high-performance parser for French based on Syntax . It brings together specialists of NLP and specialists of Machine Learning, in a very fruitful way.

ANR project EDyLex (Nov. 2009 – Oct. 2011)

Participants : Benoît Sagot [ principal investigator ] , Gaëlle Recourcé, Rosa Stern, Laurence Danlos, Pascal Denis.

EDyLex is an ANR project (STIC/CONTINT) headed by Benoît Sagot. The focus of the project is the dynamic acquisition of new entries in existing lexical resources that are used in syntactic and semantic parsing systems: how to detect and qualify an unknown word or a new named entity in a text? How to associate it with phonetic, morphosyntactic, syntactic, semantic properties and information? Various complementary techniques will be explored and crossed (probabilistic and symbolic, corpus-based and rule-based...). Their application to the contents produced by the AFP news agency (Agence France-Presse) constitutes a context that is representative for the problems of incompleteness and lexical creativity: indexing, creation and maintainance of ontologies (location and person names, topics), both necessary for handling and organizing a massive information flow (over 4,000 news wires per day).

The participants of the project, besides Alpage, are the LIF (Université de Méditerrannée), the LIMSI (CNRS team), two small companies, Syllabs and Vecsys Research, and the AFP.

ANR project Rhapsodie (2008 – 2010)

Participants : Sylvain Kahane, Éric Villemonte de La Clergerie, Marie Candito, Benoit Crabbé, Benoît Sagot.

Rhapsodie is an ANR project headed by Anne Lacheret (University Paris X). The aim of the project is to study the matching of prosody and syntax on a 30 hours corpus of spoken French by providing prosodic and syntactic annotations. Alpage participates to the project at two different levels: the specification of the transciption and syntactic annotation framework and the use of parsers for preparing the manually validated syntactic corpus annotation.

Action Scribo (2007 – 2009, extended until 2010)

Participants : Éric Villemonte de La Clergerie, Benoît Sagot, Rosa Stern, Pascal Denis, Gaëlle Recourcé, Victor Mignot.

Scribo Homepage:

Scribo aims at algorithms and collaborative free software for the automatic extraction of knowledge from texts and images, and for the semi-automatic annotation of digital documents. Scribo has a total budget of 4.3M Euros and is funded by the French “Pôle de compétivité” Systematic from Mid 2008 til end 2010. It brings 9 participants together: AFP, CEA LIST, INRIA, LRDE (Epita), Mandriva, Nuxeo, Proxem, Tagmatica and XWiki.


