Section: New Results

Discourse Synchronous TAGs: a formalism for discourse analysis

Participant : Laurence Danlos.

d-stag is a new formalism for the automatic analysis of the discourse structure of texts [4] . The analyses computed by d-stag are hierarchical discourse structures annotated with discourse relations, that are compatible with discourse structures computed in sdrt , [59] . The discourse analysis extends the sentential analysis, without modifying it, which simplifies the realization of the system. More precisely, it is based on the following architecture with three modules :

  1. the sentential analysis, which gives for each sentence of the input discourse a syntactic and semantic analysis;

  2. the sentence–discourse interface, which is a module that is necessary if one wants (and it is what we want) not to modify the sentential analysis;

  3. the discourse analysis, which computes discourse structure.

The second step consists in getting a “normalized form for discourse” (DNF) from the syntactic analysis of a suite of sentences. It turns out that the results of the (French) syntactic analyzers are not good enough to obtain satisfactory DNFs. This negative findings can be explained by the following data: in the evaluation campaigns of French syntactic analysers, namely EASy next PASSAGE, the metrics that are used give the same importance to short-rang and long-rage relations (dependencies). The former are much more numerous that the latter and so are quite relevant to be highly ranked. Moreover the former are much more easy to compute. As a result, the long-range relations are somehow neglected. Unfortunately, the DNF for a discourse can only be obtained with a high quality tool for segmenting sentences into clauses, which requires to detect long-range dependencies.

For this reason, we postpone the implementation of d-stag waiting for best results from French syntactic analyzers. However, we are enhancing the coverage of d-stag by studying how to handle quotations and the quotation incidents that introduce them. This work was initiated in the project Scribo (see  6.23 and  8.1.5 ). This lead to an inventory of “quotation verbs” extracted from an AFP corpus, half of them being not reported speech verbs. We start exploring the structure of discourses with quotations, which may question some basic principles in SDRT.


