Sapiens: visualizing quotations in news wires

Participants : Benoît Sagot, Éric Villemonte de La Clergerie, Rosa Stern, Pascal Denis, Victor Mignot, Laurence Danlos, Gaëlle Recourcé.

In relation to the Scribo project (see  8.1.5 ), several Alpage members were involved in the development of a demonstration environment for linguistic processing. This environment, named Sapiens, is a platform of quotations visualization in news wires associated with its author and context [50] . It has been applied to a corpus provided by the Agence France-Presse (AFP). Sapiens demonstrates how named entities can be related to events, here to quotations in news wires from AFP (Agence France Presse). demonstrated during the annual System@tic meeting, in front of a large audience including the State Secretary for Research.

The orginality of this environment is that it relies on a deep linguistic processing chain that includes Sx Pipe processing chain (that includes named entity recognition), the FRMG parser and a coreference resolution module, which allows for extracting quotations with a wide coverage and an extended definition, including quotations which are only partially quotes-delimited verbatim transcripts. It is an example of an application based on information extraction, which can be useful to final users as journalists looking for relevant information in news archives. The resulting information is stored in a database and can thus be reused, for instance in the development of an ontology.

From a more linguistic perspective, this work led us to try and study the syntactic and discursive properties of so-called quotation verbs (such as “say”, “laugh” or “conclude”) that can head constructions such as “It is a wonderful idea”, laughed Peter . This raises very important NLP issues, since such constructions are in contradiction with many assumtions made by most parsers, although they are very common and very intersting from an applicative point of view. This linguistic study has been described in two submitted publications, one that focuses on the syntactic level [38] and another that includes the discursive level.


