Inria / Raweb 2004

Search in Activity Report, year 2004:


Project-Team : calligramme

Section: Software


Leopar is a parser for natural languages which is based on the formalism of Interaction Grammars (IG) [49]. The first release has been developed by G.Bonfante, B.Guillaume and G.Perrier. G.Bonfante, B.Guillaume, G.Perrier and S.Pogodalla have worked on the current version. It uses a parsing principle, called ``electrostatic parsing'' which is based on neutralizing opposite polarities. A positive polarity corresponds to an available linguistic constituent and a negative one to an expected constituent.

Parsing a sentence with an Interaction Grammar (IG) consists in first selecting a lexical entry for each of its words, then in merging all selected descriptions—tree descriptions la Vijay-Shanker—into a unique one which represents a syntactic description of the sentence. The criterion for success is that this ultimate description is a neutral tree description. As IG are based on under-specified trees, Leopar uses some specific and non-trivial data-structures and algorithms.

The electrostatic principle has been intensively considered in Leopar. The theoretical problem of parsing IGs is NP-complete; the indeterminism usually associated to NP-completeness is present at two levels: when a description for each word is selected from the lexicon, and when a choice of what nodes to merge is made. Polarities have shown their efficiency in pruning the search tree for these two steps.

For the first step (tagging the words of the sentence), we forget the structure of description, and only keep the feature structures. In this case, parsing inside the formalism is greatly simplified because composition rules reduce to the neutralization of two labels + l and -l. As a consequence, parsing reduces to a counting of positive and negative polarities present in the selected tagging for every label l: every positive label counts for +1 and every negative label for –1, the sum must be 0.

for the second step (node-merging phase), polarities are used to cut off parsing branches whose trees contain too many uncancelled polarities.

Leopar's first release is available on the web under the GNU General Public License ( ). The current implementation is provided with a small grammar for French and a lexicon suited for this grammar. The grammar contains 31 descriptions. Despite its small size, it covers several non trivial linguistic phenomena (relative clauses, negation, pied-piping in the relatives...). The lexicon contains 78 entries.