## Project-Team : calligramme

## Section: Scientific Foundations

**Keywords: ***categorial grammar*, *Montague semantics*, *syntactic inference*, *syntactic analysis of natural languages*, *semantics of natural
languages*, *dependency grammar*, *tree description*.

## Categorial Grammars and Dependency Grammars

**Participants:**Jérôme Besombes, Guillaume Bonfante, Denys Duchier, Philippe de Groote, Bruno Guillaume, François Lamarche, Joseph Leroux, Jean-Yves Marion, Guy Perrier, Sylvain Pogodalla, Sylvain Salvati, Lutz Straßburger.

Lambek's syntactic calculus, which plays a central part in the theory of categorial grammars, can be seen

a posteriorias a fragment of linear logic. As a matter of fact it introduces a mathematical framework that enables extensions of Lambek's original calculus as well as extensions of categorial grammars in general. The aim of this work is the development of a model, in the sense of computational linguistics, which is more flexible and efficient than the presently existing categorial models.

The relevance of linear logic for natural language processing is
due to the notion of resource sensivity. A language (natural or
formal) can indeed be interpreted as a system of resources. For
example a sentence like *The man that Mary saw Peter
slept* is incorrect because it violates an underlying principle
of natural languages, according to which verbal valencies must be
realized once and only once. Categorial grammars formalize this idea
by specifying that a verb such as saw is a resource which will give
a sentence S in the presence of a nominal subject phrase, NP,
and only one direct object NP. This gives rise to the following
type assigment:

Mary, Peter: | NP | |

saw | (NP\S)/NP |

where the slash (/) and the backslash (\) are interpreted respectively as fraction pairings that simplify to the right and to the left, respectively. However we notice very soon that this simplification scheme, which is the basis of Bar-Hillel grammars [32] is not sufficient.

Lambek solves this problem by suggesting the interpretation of
slashes and backslashes as implicative
connectors [43][44]. Then not only do
they obey the *modus ponens* law which turns out to be
Bar-Hillel's simplification scheme

but also the introduction rules:

The Lambek calculus does have its own limitations. Among other things it cannot treat syntactical phenomena like medial extraction and crossed dependencies. Thus the question arises: how can we extend the Lambek calculus to treat these and related problems? This is where linear logic comes into play, by offering an adequate mathematical framework for attacking this question. In particular proof nets appear as the best adapted approach to syntactical structure in the categorial framework.

Proof nets offer a geometrical interpretation of proof construction. Premises are represented by proof net fragments with inputs and outputs which respectively model needed and offered resources. These fragments must then be combined by pairing inputs and outputs according to their types. This process can also be interpreted in a model-theoretical fashion where fragments are regarded as descriptions for certain class of models: the intuitionistic multiplicative fragment of linear logic can be interpreted on directed acyclic graphs, while for the implicative fragment, trees suffice [48].

This perspective shift from proof theory to model theory remains founded on the notion of resource sensitivity (e.g. in the form of polarities and their neutralization) but affords us the freedom to interpret these ideas in richer classes of models and leads to the formalism of Interaction Grammars. For example:

where previously we only considered simple categories with polarities, we can now consider complex categories with polarized features.

we can also adopt more expressive tree description languages that allow us to speak about dominance and precedence relations between nodes. In this fashion we espouse and generalize the monotonic version of Tree Adjoining Grammars (TAG) as proposed by Vijay-Shanker [53].

contrary to TAG where tree fragments can only be inserted, Interaction Grammars admit models where the interpretations of description fragments may overlap.

Another grammatical framework which embraces both the notion of resource sensitivity and the interpretational perspective of model theory is dependency grammar.

Dependency grammar is predicated on the notion of an asymmetrical relation of (syntactic or semantic) dependency. This analytical idea has a very long history dating back at least to Panini (450 BC) and the ancient logicians and philosophers, and made its way into european medieval linguistics under the spreading influence of the Arabic linguistic tradition. The modern notion of dependency grammar is usually attributed to Tesnière [52] and has been further developed in such stratificational formalizations as Functional Generative Description (FGD) [51] and Meaning-Text Theory [46].

The main formal notions of dependency grammar that reflect and embody its sensitivity to resources are subcategorization (sensitivity to syntactic resources) and valency (sensitivity to semantic resources). It should be noted that these core DG concepts of head/dependent asymmetry and subcategorization/valency have been adopted by other grammatical formalisms. In the categorial grammar tradition, these notions appear respectively as directional functional application and categorial types. In HPSG, they give rise to the notion of headed structures, head daughters, and SUBCAT lists.

Dependency grammar permits non-projective analyses, i.e. where branches may cross. For this reason, it holds a special appeal for languages with free or freer word-order than French or English, such as German, Russian, Czech... and is certainly one reason for the strong renewal of interest in DG in recent years.

Duchier [39] proposed a new formulation of DG with a model theoretic interpretation that has a natural reading as a concurrent constraint program. This approach, based on set constraints, offers an efficient treatment of both lexical and structural ambiguity, and produces parsers that take full effective advantage of constraint propagation to achieve very good practical performance.

Extending this approach, Duchier and Debusmann [38] proposed Topological Dependency Grammar (TDG) as a means to equip DG with a theory of word-order. TDG adopts the ID/LP(Immediate Dependence / Linear Precedence) perspective and explains word-order phenomena as arising through the interactions of a non-ordered ID tree of syntactic dependencies and an ordered and projective LP tree of topological dependencies. They provided a detailed, yet simple, account of the challenging word-order phenomena in the verbal complex of German verb final sentences. This extension retains the computational advantages of model elimination through powerful constraint propagation.