Inria / Raweb 2004
Project-Team: MODBIO

Search in Activity Report, year 2004:
HELP

INDEX

Project-Team : modbio

Section: New Results


Keywords: locally coherent discourse.

Computing locally coherent discourses (ATIPE)

Participant: Ernst Althaus.

One central problem in discourse generation and summarisation is to structure the discourse in a way that maximises coherence. Coherence is the property of a good human-authored text that makes it easier to read and understand than a randomly-ordered collection of sentences. Several papers in the recent literature have focused on defining local coherence, which evaluates the quality of sentence-to-sentence transitions. Measures of local coherence specify which ordering of the sentences makes for the most coherent discourse, and can be based e.g. on Centering Theory or on statistical models. While formal models of local coherence have made substantial progress over the past few years, the question of how to efficiently compute an ordering of the sentences in a discourse that maximises local coherence is still largely unsolved.

In [16], we present the first algorithm that computes optimal locally coherent discourses, and establishes the complexity of the discourse ordering problem. We first prove that the discourse ordering problem for local coherence measures is equivalent to the Travelling Salesman Problem (TSP). This result implies that the problem is not approximable. Despite this negative result, we show that by applying modern algorithms for TSP, the discourse ordering problem can be solved efficiently enough for practical applications. We define a branch-and-cut algorithm based on linear programming, and evaluate it on discourse ordering problems based on the GNOME corpus and the BLLIP corpus. If the local coherence measure depends only on the adjacent pairs of sentences in the discourse, we can order discourses of up to 50 sentences in under a second. If it is allowed to depend on the left-hand context of the sentence pair, computation is often still efficient, but can become expensive.


previous
next