The Bonzai PCFG-LA parser
Participants : Benoit Crabbé [ correspondant ] , Marie Candito, François Guérin, Pascal Denis, Djamé Seddah.
Alpage has developped as support of the research papers  , , ,  a statistical parser for French, named Bonzai, trained on the French Treebank. This parser provides both a phrase structure and a projective dependency structure specified in  as output. This parser operates sequentially : (1) it first outputs a phrase structure analysis of sentences reusing the Berkeley implementation of a PCFG-LA trained on French by Alpage (2) it applies on the resulting phrase structure trees a process of conversion to dependency parses using a combination of heuristics and classifiers trained on the French treebank. The parser currently outputs several well known formats such as Penn treebank phrase structure trees, Xerox like triples and CONLL-like format for dependencies. The parsers also comes with basic preprocessing facilities allowing to perform elementary sentence segmentation and word tokenisation, allowing in theory to process unrestricted text. However it is believed to perform better on newspaper-like text. This parser is to be released in 2010 under a GPL license.