Overall Objectives
Research Program
Application Domains
Software and Platforms
New Results
Partnerships and Cooperations
XML PDF e-pub
PDF e-Pub

Section: New Results

Linguistic Resources

Large Scale Grammatical Resources

Guy Perrier and Bruno Guillaume continued to develop FRIGRAM ( ) a French grammar with a large coverage, written in the formalism of Interaction Grammars [16] .

A major challenge in this task is to guarantee and to maintain the consistency of the grammar while aiming at the largest coverage. For this, they resorted an original property coming from the polarization of the elementary structures of an interaction grammar : the companion property. It is possible to determine all elementary structures (the companions) that are able to interact with a given elementary structure, in a static computation on the whole non anchored grammar, using the systeme of polarities. The knowledge of the companions of every elementary structure is very useful to check the linguistic consistency of a grammar.

Guy Perrier wrote a detailed documentation on FRIGRAM illustrated with a lot of examples [26] .

Deep Syntax Annotation of the Sequoia French Treebank

Marie Candito, Guy Perrier, Bruno Guillaume, Corentin Ribeyre, Karën Fort, Djamé Seddah and Eric de la Clergerie started a project of annotating the Sequoia French Treebank with deep syntax dependencies.

The Sequoia French Treebank [33] is a 3 200 sentence treebank covering several domains (news, medical, europarl and fr-wikipedia). It is freely available and has already been annotated with surface dependency representations.

The participants in the project have defined a deep syntactic representation scheme for French, which abstracts away from surface syntactic variation and diathesis alternations. The goal is to obtain a freely available corpus, which will be useful for corpus linguistics studies and for training deep analyzers to prepare semantic analysis.

The different steps of the annotation process were conducted in a collaborative way. As the members of the project are located in two different French towns (Paris and Nancy), they decided to produce a complete annotation of the TreeBank in both towns and to collaboratively adjudicate the two results. In Nancy, Line Heckler, Mathilde Huguin and Alice Kneip produced a double annotation of the corpus and Guy Perrier was in charge of the adjudication.

At the beginning of the project, a mini reference was selected randomly, composed of 250 sentences from the Sequoia Corpus. Its annotation was conducted in parallel to the production of the annotation guide, in order to get feedback for the guide. Each team separately produced an initial annotated version of the mini reference. The final version, resulting from several iterations and adjudications, is already available ( ).

The full version of the Sequoia French Treebank with deep syntax dependencies and its annotation guide will be released during Spring 2014.

Agile Annotation

In [19] , Bruno Guillaume and Karën Fort present a methodology, inspired from the agile development paradigm, that helps preparing an annotation campaign. The idea behind the methodology is to formalize as much as possible the instructions given in the guidelines, in order to automatically check the consistency of the corpus being annotated with the guidelines, as they are being written. To formalize the guidelines, the authors use a graph rewriting tool, that allows to use a rich language to describe the instructions. This formalization allows to spot the rightfully annotated constructions and, by contrast, those that are not consistent with the guidelines. In case of inconsistency, an expert can either correct the annotation or update the guidelines and rerun the process.

Integration of Multiple Constraints in ACG

In [14] , Jiri Marsik and Maxime Amblard present a first step toward the integration of multiple constraints in ACG. However, all of the known treatments only consider tiny fragments of languages. We are interested in building a wide-coverage grammar which integrates and reconciles the existing formal treatments of discourse and allows us to study their interactions and to build discourse representations automatically.

This proposal is a first step towards a wide-coverage Abstract Categorial Grammar (ACG) that could be used to automatically build discourse-level representations. We focus on the challenge of integrating the treatment of disparate linguistic constraints in a single ACG and propose a generalization of the formalism: Graphical Abstract Categorial Grammars.