Section: Scientific Foundations
Sentence structure and formal grammars: syntax
Participants : Lionel Clément, Alain Lecomte, Renaud Marlet, Richard Moot [ correspondent ] , Christian Retoré, Sylvain Salvati.
Sentence (or phrasal) structure is usually modelled via a tree structure. Different families of syntactic models are studied in Signes: rewriting systems of the Chomsky hierarchy, including tree grammars, deductive systems, i.e. categorial grammars, and constraint-based approaches.
Rewriting systems have excellent computational properties and a quite good descriptive adequacy. Relevant classes of grammars for natural language syntax, the so-called mildly context sensitive languages, are just a bit beyond context-free languages, and they are parsable in polynomial time as well well  . Among these classes of grammars let us mention Tree Adjoining Grammars  ,  and Minimalist Grammars  ,  ,  . Dependency Grammars and Lexical Functional Grammars share some properties with them but the general paradigm is quite different  ,  .
Edward Stabler in  introduced Minimalist Grammars (MGs) as a formalization of the most recent model of the Chomskian or generative tradition and they are quite appealing to us. They offer a uniform model for the syntax of all human languages.
There are two universal, language independent, rules, called merge and move: they respectively manage combination of phrases and movement of phrases (or of smaller units, like heads).
Next, a language is defined by a (language dependent) lexicon which provides words with features describing their syntactic behavior: some features trigger merge and some others move. Indeed, features have positive and negative variants which must cancel each other during the derivation (this is rather close to resource logics and categorial grammars).
Consequently, MGs are able to describe numerous syntactic constructs, providing the analyzed sentences with a fine grained and complete syntactic structure. The richer the syntactic structure is, the easier it is to compute a semantic representation of the sentence.
MGs also cover phenomena which go beyond syntax, namely morphology via flexional categories, and they also incorporate some semantic phenomena like relations between pronouns and their possible antecedents, quantifiers, etc.
A drawback of rewrite systems, including MGs, is that they do not allow for learning algorithms that can automatically construct or enlarge grammars from structured corpora. But their main drawback comes from the absence of structure on terminals, which gives no hint about the predicative structure of the sentence.
Indeed, a strong reason for Signes using categorial grammars and their extensions  . Indeed, despite the inefficiency and the restricted linguistic coverage initial categorial grammars (BA, Lambek) provide a correspondence between syntactic analyses and semantic representations, which we are trying to extend to richer formalisms. This will be explained in the next section on the syntax/semantics interface.
In order to improve the computational properties of categorial grammars, and to extend their scope, we have been working on connecting them to more efficient and wider formalisms, like MGs  ,  ,  .
A relatively new approach to syntax is known as model-theoretic syntax. Its advantages have been underlined by Geoffrey Pullum in  . Instead of viewing the trees or strings as a closure of some base set of expressions, they are viewed as trees or sets satisfying a set of formulae. This approach may be considered as another way of describing generative grammars. The advantages of such a description are not in the parsing algorithms (MSO or Constraint Satisfaction are usually of high complexity) but rather in characterising the language class and possibly describing it in a linguistically natural way (as opposed to lexical items of lexicalized grammars). This connection to logic is related to constraint-logic programming or to monadic second order logic.
In the MSO style, the pioneering work of James Rogers on Government and Binding and Tree Adjoining Grammars must be mentionned in  . Uwe Mönnich, Jens Michaelis and Frank Morawietz have obtained a two step description of minimalist grammars that we are studying  ,  .
In the constraint style issued from the Prolog-Definite Clause Grammars, Head Phrase Structure Grammar, Construction Grammars and Property Grammars are defined as sets of constraints. The later ones introduced by Philippe Blache offer a rather natural way to describe grammar rules and have been studied by Marie-Laure Guénot in our group  ,  .
High-Level Syntactic Formalisms
Lionel Clément worked on a formal representation of grammatical generalisations implemented for several linguistic formalisms.
This work deals with the problem of same linguistic phenomena expressed in several formalisms, alternative realisations and linguistic generalisations. The project aims at finding a common representation platform for all considered formalisms and factoring out elements shared in different linguistic constructions (i.e. different realizations of a nominal subject). The alternatives describe sets of related grammatical constructions (i.e. diathesis alternations). Finally, the shared part of these descriptions is expressed in a high-level linguistic formalism closely related to metagrammar representations  ,  ,  . For instance, diathesis alternations can be considered an intersection of syntactic realizations of passive, active or causative sentences.
As exposed on the ARC Mosaïque web site http://mosaique.labri.fr/ , the new idea introduced in the metagrammar paradigm is the fact that metagrammars handle two kinds of factorized informations: structural (and formalism dependent: tree structures, graphs, dependancies), and linguistic. The latter presupposes introducing of a way to represent non generative data and linguistic knowledge, without redundancy.
In addition to studying formal properties of the models mentioned above, Signes use them to describe linguistic phenomena in various languages. Dependency Grammars have been applied to a detailed analysis of word order in German, whereas various French phenomena have been formalised and implemented as computational grammars adopting Property Grammar frameworks  ,  . Lexical Functional Grammar: the XLFG parser implemented by Lionel Clément. Finally, a morphosyntacitc analysis of Polish past tense and conditional verb forms has been modelled in HPSG. This formalism has also been used by members of the group to account for French inflection.