Project Team Alpage

Overall Objectives
Scientific Foundations
Application Domains
New Results
Contracts and Grants with Industry
Partnerships and Cooperations
PDF e-pub XML

Section: Software

The Bonsai PCFG-LA parser

Participants : Benoît Crabbé [correspondant] , Marie Candito, Pascal Denis, Djamé Seddah.

Web page:

Alpage has developped as support of the research papers [81] , [70] , [71] , [12] a statistical parser for French, named Bonsai, trained on the French Treebank. This parser provides both a phrase structure and a projective dependency structure specified in [4] as output. This parser operates sequentially: (1) it first outputs a phrase structure analysis of sentences reusing the Berkeley implementation of a PCFG-LA trained on French by Alpage (2) it applies on the resulting phrase structure trees a process of conversion to dependency parses using a combination of heuristics and classifiers trained on the French treebank. The parser currently outputs several well known formats such as Penn treebank phrase structure trees, Xerox like triples and CONLL-like format for dependencies. The parsers also comes with basic preprocessing facilities allowing to perform elementary sentence segmentation and word tokenisation, allowing in theory to process unrestricted text. However it is believed to perform better on newspaper-like text. The parser is available under a GPL license.