Inria / Raweb 2004

Search in Activity Report, year 2004:


Project-Team : calligramme

Section: New Results

Categorial Grammars and Dependency Grammars

Lexical disambiguation

G.áBonfante, B.áGuillaume and G.áPerrier have developed original parsing methods for interaction grammars. Recent developments in this area have led them to generalize these methods in order to deal with other well-known formalisms. These methods exploit the fact that interaction grammars are a polarized formalism. Their parsing procedure is effected in two steps: the first achieves lexical disambiguation by a global counting of polarities; the second step is the parsing process itself. It is that first step of lexical disambiguation that has been adapted to other formalisms.

Polarization in interaction grammars reflects the fact that many syntactic constituents can be viewed as consumable resources. In some other formalisms, like tree adjoining grammars (TAG) or Lambek grammars (LG) for instance, the resource sensitivity is hidden in the syntactic composition rules. It is possible to make the resource sensitivity explicit by adding polarities and then specific methods based on these polarities can be applied to the polarized versions.

In [16], this idea is formalized. A general notion of grammatical formalism is given, followed by a notion of morphism between grammatical formalisms. This notion of morphism is a very general framework for linking two grammatical formalisms. It can be used both for formalizing the polarization of an arbitray grammatical formalism, and for lexical disambiguation.

For TAGs the polarization morphism is defined as follows: every root of an elementary tree carries a positive polarity and every substitution node carries a negative polarity. Then, a substitution operation can be viewed as the neutralization of two polarities. For dealing with adjunctions, two dual polarities are put on every node where an adjunction could occur and on the root and foot node of an adjunction tree. For LG, the polarization morphism is easier to define: it is enough to polarize positively (resp. negatively) every output (resp. input) formula.

For lexical disambiguation, the key idea of our method is the definition of an abstraction morphism from the considered formalism to a simpler one. This morphism ensures that parsing in the simpler formalism is equivalent to lexical disambiguation in the former one.

In the paper, several methods for lexical disambiguation of polarized formalisms are given. The most general one consists in forgetting everything but the polarities in every syntactic structure. Hence, an elementary syntactic structure is a multiset of polarities. In a polarized formalism, a successful parsing is globally neutral. In the case of multisets of polarities, we use automata-based techniques to select globally neutral taggings for a sentence. Then this method is applied to TAG and LG.

In order to improve the performance of lexical disambiguation, we define some other methods specific to a formalism. For instance, some methods use either projectivity of the formalism (for LG) or different polarities for substitution and adjunction (for TAG). In these cases, a bottom-up parsing algorithm is used.

Interaction Grammars

In their original versioná[4] as well as their implementation via LEOPARá 5.1, Interaction Grammars (IGs) are focused on the syntax of natural languages—but syntax is only a way of accessing semantics and a linguistic formalism cannot really process natural languages at the syntactic level without relating it to the semantic one. In the classical presentation of grammatical formalisms, like TAGs and Categorial Grammars, the semantic representation of sentences is a simple projection of their syntactic derivation tree. Such a representation is too rigid to express some noncompositional phenomena—relations between quantification scopes for instance. G.áPerrier has extended IGs to semantics with a more flexible approach of the syntax-semantics interfaceá[22]. He has added a new level to the syntactic one, at which Directed Acyclic Graph descriptions are used to represent underspecified logical forms, and the same mechanism—feature neutralization—as for the syntactic level is used for composing the semantic representation of utterances. The linking between the syntactic and the semantic levels is performed by a partial function from the nodes of syntactic descriptions to the nodes of the associated semantic descriptions.

Abstract Categorial Grammars

Ph.ádeáGroote and S.áPogodalla describe how to express various context-free formalisms with ACGs. It concerns context-free string grammars, linear context-free tree grammars and linear context-free rewriting systems. This proves ACGs to be able to cover important (w.r.t. natural language modeling) classes of languages such as multi-component tree adjoining grammars [54], multiple context-free grammars [50] or minimalist grammars [47].

Besides the study of the expressive power of ACGs, S.áPogodalla has extended them with some non-linear languages so that the decidability properties of the related problems (parsing and generation) remain. Non-linearity is needed for semantic representation languages and is obtained iná[24] through restrictions on the lexicon between the abstract language (which remains linear) and the object one (which can be non-linear).

With that extension, S.áPogodallaá[25][23] shows how to address some problems in building semantic representations for TAGs from the derivation tree (represented as abstract terms), when dealing with quantification, opaque adverbs and intersective or subsective adjectives, verbs with phrasal arguments or wh-questions.

Sylvain Salvati has developped a formal system, the calculus of syntactic descriptions. It formalizes the notion of index used in Earley algorithms for Context Free Grammars and Tree Adjoining Grammars and extends it to the linear $ \lambda$-calculus. This formalization led him to propose a formal system for solving linear matching equations in the linear $ \lambda$-calculus and another one for parsing Abstract Categorial Grammars whose abstract language is built on a second order signature. Those systems have made possible, for both problem, the design of an algorithm which uses tabulation technics.

In the perspective of extending Abstract Categorial Grammars with the other connectives of linear logic, Philippe de Groote and Sylvain Salvati studied the complexity of higher-order matching in the calculus associated, through the Curry-Howard correspondance, to the Multiplicative and Additive fragment of Intuitionnistic Linear Logic. They proved that the problem is NP-complete provided that the left member of the equation is given in normal formá[28].

In order to study the expressive power of the Abstract Categorial Grammars, Ph. de Groote, B. Guillaume and S. Salvati have introduced the notion of Vector Addition Tree Automaton. They proved that the reachability problem for these automata, which corresponds to the decidability of emptiness for the Abstract Categorial Grammars, is equivalent to decidability of exponential multiplicative linear logicá[27].

Grammatical inference

The study of exact inference algorithms for learning languages gives ones improved insights on the problem of language acquisition. Indeed, one can expect to have mathematical models of grammar acquisition and thus have a finer organization of data compared to standard stochastic approaches. Such studies have been made by JÚr˘me Besombes and Jean-Yves Marion. Their main contribution is certainly, starting with structural examples, the switch from learning ordinary grammars to learning tree languages. Learning from structural languages was suggested by Sakkakibara and and taken up again by Kanasawa recently.

JÚrome Besombes and Jean-Yves Marioná[10][13] have defined a class of categorical grammars they call discrete. They show that the class of discrete classical categorial grammars is identifiable from positive structured examples. For this, they provide an original algorithm, which runs in quadratic time in the size of the examples. This work extends the previous results of Kanazawa. Indeed, in that work, several types can be associated to a word and the class is still identifiable in polynomial time. The relevance of the class of discrete classical categorial grammars is demonstrated by linguistic examples, like e.g. verbs with transitive and non transitive forms, homonymies.

In order to study the learning of regular tree languages as a model of natural language acquisition, JÚrome Besombes and Jean-Yves Marioná[14] studied a paradigm of exact learning from positive data and membership queries. A polynomial algorithm based on this paradigm has been constructed and its correctness proved. This learning paradigm is very natural in several situations, not only linguistics but also when a query is submitted to a server and the answer is the acceptance or not of the request. We have generalized this work to dependency grammará[11].