Section: New Results
Probabilistic TIG-based dependency parsing
Participants : Pierre Boullier, Benoît Sagot.
- PCFG (Probabilistic Context-Free Grammar)
a Context-Free Grammar (CFG) with probabilities associated with each production.
Collaboration with Alexis Nasr (LIF, Université de Marseille-Provence), Owen Rambow (Cornell University, New York, USA) and Srinivas Bangalore (AT&T labs, USA).
Two members of Alpage, in collaboration with other teams in France and USA, developed a state-of-the-art dependency parser for English, named MICA (this acronym recalls the four different affiliations of the developers: (University of) Marseille, Inria, Cornell University and AT&T) [16] . It relies on a grammar (TIG) extraction algorithm initially developed by [75] and applied on the Penn TreeBank. The grammar extraction step allows to learn a supertagger, which is the first step of the full parsing process. The output of the supertagger, partially pruned, is given as an input to a parser generated by Syntax from the extracted grammar.
Results are approximatively state-of-the-art as far as precision and recall is concerned, and significantly better in terms of parsing speed. The work on MICA will directly benefit to the SEQUOIA project (see 8.1.2 ), as soon as all underlying techniques are transfered to French.
The MICA parser is distributed freely (http://mica.lif.univ-mrs.fr/ ).