Team SIGNES

Members
Overall Objectives
Scientific Foundations
Application Domains
Software
New Results
Other Grants and Activities
Dissemination
Bibliography

Section: New Results

Morphology and Syntax

Automated grammar checking

Participant : Lionel Clément.

Lionel Clément, Renaud Marlet with Kim Gerdes (U. Paris 3) defined the central algorithm of an open system for grammar checking, based on deep parsing. The grammatical specification is a context-free grammar with flat feature structures (Datalog). After a shared-forest analysis where feature agreement constraints are relaxed, error detection globally minimizes the number of corrections and alternative correct sentences are automatically proposed. [23] , [22]

The Grail parser/theorem prover

Participants : Richard Moot [ correspondent ] , Natalia Vinogradova.

Richard Moot has improved the Grail parser/theorem prover in several respects. First is the addition of an online tutorial (http://www.labri.fr/perso/moot/tutorial/ ), giving an introduction on how to use Grail, detailing how to write grammars, use the parser interactively and explaining the more advanced features, which help the user make their grammar as efficient as possible. Secondly, the semantic component has seen several additions, such as a type checker for the lambda terms in the lexicon and an interface to database queries. Finally, Grail has been updated to integrate more tightly with the supertagger and a user interface has been added to facilitate the use of Grail in combination with a part-of-speech tagger and supertagger.

Towards a wide coverage categorial grammar for French

Participant : Richard Moot [ correspondent ] .

Richard Moot has started the development of a wide-coverage categorial grammar for the french language, which is automatically extracted from the Paris VII treebank, then corrected manually. A very early version of this categorial grammar, though still containing noise and artifacts, already gives promising first results. Because of the large lexicon, which assigns several hundreds of different lexical formulas to many frequent words (forms of the auxiliary verbs "être" and "avoir" and conjunctions like "et" and "ou", for example have between 300 and 500 entries in the current lexicon), the lexical ambiguity is prohibitive to parsing with the extracted grammar. A solution with a supertagger, which assigns only the most probable formula to a word based on its immediate context (preceding and succeeding words, formulas assigned to previous words) give promising first results: trained on over 250.000 words of the treebank it assigns between 80,81% and 95,67% of the correct formulas to words in unseen sentences (the lower precision corresponds to exactly one formula per words, whereas the higher precision assigns, on average, 6,98 formulas). Work to clean up the extracted grammar continues, with each correction both reducing the number of different formulas extracted (currently 2.531, though only 1.052 occur more than once) and improving the performance of the supertagger.

Sanskrit Processing

Participant : Gérard Huet [ correspondent ] .

The cooperation with the Sanskrit Studies Department of Hyderabad University continued successfully. In January 2009 the 3rd Sanskrit Computational Linguistics Symposium was organized in Hyderabad by Amba Kulkarni. Amba Kulkarni and Gérard Huet are joint editors of the Proceedings, edited in the Springer Verlag Lecture Notes series. In November 2009 Amba Kulkarni spent a month at INRIA Paris-Rocquencourt, plus visits at various European sites in view of forming a Euro-India Consortium on the topic of Sanskrit Computational Linguistics Symposium. A Memorandum of Understanding between INRIA and Hyderabad University for the continuation of the joint team effort in a bilateral framework has been drafted, and should be signed in 2010.


previous
next

Logo Inria