Extracting a Syntactical Lexicon for French from the LADL Tables

Participants : Claire Gardent, Ingrid Falk, Guy Perrier, Bruno Guillaume.

Maurice Gross' grammar lexicon contains rich and exhaustive information about the morphosyntactic and semantic properties of French syntactic functors (verbs, adjectives, nouns). Yet its use within natural language processing systems is hampered both by its non standard encoding and by a structure which is partly implicit and partly underspecified.

Together with Calligramme, we developed a method for extracting an NLP oriented syntactic lexicon from the digitised version of Gross' grammar lexicon, namely the LADL tables. In essence, this method aims at making the table structure explicit and at translating the headings into standard practice feature structure notation. Specifically, it consists in the following three steps:

  1. For each table, a SynLex-graph is (manually) produced which represents our interpretation of the table. This graph makes the table structure explicit and translates the headings into path equations. A SynLex-graph and a LADL table are the input of the next step.

  2. A graph traversal algorithm is specified: given a SynLex-graph and a table, it produces for each entry in that table, the set of subcategorisation frames associated by the table with that entry. The resulting lexicon is called a LADL-lexicon and closely reflects the content of the LADL table. Some of the information obtained in this way is superfluous for most current NLP tools, in particular, by parsers and surface realisers. Hence, a third step is required.

  3. A simplification algorithm is specified: given a LADL-lexicon, it produces an NLP-lexicon. The NLP lexicon is a simplified version of the LADL-lexicon where only features relevant for parsing/generating are preserved and which only partially reflects the content of the LADL-table. It is with this lexicon that NLP is expected to proceed.


