Section: New Results
Lexconn : French Lexicon of Discourse Connectives
Participants : Laurence Danlos, Charlotte Roze, Philippe Muller.
Lexconn is a French lexicon of 330 discourse connectives, collected with their syntactic category and the discourse relation(s) they express  ,  . Such a resource already exists for English, Spanish and German, but Lexconn is the first one for French. The lexicon aims at being exhaustive. It has been constructed manually, applying systematic connective identification criteria, associating a SDRT relation, and the type (coordinating or subordinating) of this relation with each connective. This work leads to a reflexion on the set of relations defined in SDRT and the distinction between implicit relations (i.e. not marked by a connective) and explicit relations (i.e. marked by a connective).
Building a French lexicon of discourse connectives brought several results. It implied a systematic methodology to identify discourse connectives and associate them discourse relations, resting on various studies about connectives. In addition, it shows which connectives remain to be studied in detail (especially connectives marked as “unknown”, to which we couldn't associate any discourse relation). A statistical analysis of the resulting lexicon permitted to quantify several things, like importance of the various discourse relations in terms of number of connectives, and count of ambiguous connectives (i.e. connectives that can establish more than one relation). Lexconn contains 330 connectives. About 70% are non-ambiguous, which is an encouraging result, and only 3% establish more than one relation. Concerning ambiguous connectives, we think that there is two cases: the case where a connective establish relations of the same type (coordinating or subordinating), and the case where a connective establish relations of the two types. The first case seems less problematic than the second in an NLP perspective, because it doesn't implies structural ambiguity. Only 22 connectives are in the second case.
Despite these results, Lexconn has to be improved: some information has to be added. For example, some information about ambiguity between discourse usage and non discourse usage has to be introduced. This improvement will be possible with other linguistic analysis, but also with automatic analysis on ANNODIS corpus: we could examine the link between position in the host clause and discursive/non-discursive role for adverbials. However, Lexconn already constitute a precious resource for NLP . It might help for discourse markers annotation in ANNODIS project, in which connectives are not yet marked. A statistical analysis of the connectives on corpus can also be useful, for example concerning connective's frequency. Such analysis could help answering the following question: are ambiguous connectives the most frequent ones?