Section: New Results
Designing efficient parsers using Meta-Grammars and DyALog
Participants : Éric Villemonte de la Clergerie, Marie-Laure Guénot.
In the context of the PASSAGE action and of last parsing evaluation campaign, we have tried to improve the coverage and quality of FRMG by exploring various approaches. Beyond the use of error mining techniques on large raw corpora, we have tried more supervised based approaches relying on the repeated processing of the 4000 reference EASy treebank. Each run provides detailed information about the errors of the analysers, in particular through the use of confusion matrices for chunks and dependencies. It is also possible to build confusion matrices tracing the changes between two runs, useful to quickly detect unexpected and unwanted consequences of modifications in FRMG or companion modules. A more linguistic evaluation of the phenomena pointed by the matrices, through the examination of corresponding sentences (with the help of logs and EasyRef) was useful to detect all kinds of problems in the processing chains, some of them being also errors in the treebank (and then correction of these errors using EasyRef). The process iterated over a few months allowed a several points increase of the quality of FRMG (+2% to reach 87.7% for chunks and +4.5% to reach 64.1% for dependencies), proving the importance of good methodologies and good tools (feedback, visualization, query, ...) to improve a linguistic processing chain.
We increased the coverage of FRMG by adding new classes in the underlying meta-grammar, in particular to handle causative constructions, more cases of superlative constructions, adjective subcategorization, to cite a few of them. Such extensions tend to slow parsing, in particular because ambiguity increases. Various optimizations have been tried at different levels of the processing chain in order to contain this effect. The disambiguation algorithm on the shared dependency forests has been revised to be more efficient and better weights for the disambiguation rules have been searched through trial and errors, using the above mentioned feedback techniques. More automatic machine-learning based techniques have been tried, not leading yet to better results.