Section: Application Domains
Participants : Lionel Clément [ correspondent ] , Renaud Marlet, Richard Moot, Sylvain Salvati.
In the implementation of a robust parser, one of the major issue arises from homonymous words and phrases. Natural language is highly ambiguous and each sentence, taken without any pragmatic or semantic context, has a huge number of possible meanings. In written languages this combinatorial problem necessitates the use of subtle techniques; but in spoken languages, where normative rules have less influence, those techniques do not seem to be able to cope with ambiguity. The recent developments of natural language processing concerning the problem of ambiguity is based on stochastic and low-level methods. Those techniques try only to represent surface dependencies and forget about the various structures of phrases and about their meanings. They are quite efficient for applications such as information retrieval and lack accuracy in others like automatic translation.
We would like to develop new techniques so as to allow robust parsing of spoken language, but also so as to deal with the computation of meaning regardless the ambiguity of sentences. Usually the various possible analyses of a sentence are represented in a structure called "shared forest". Such a structure can be seen as a tree automaton. This remark gives us several directions of research. A first one would be to adapt various techniques coming from automata theory especially concerning automaton transformations and transductions. A second one consists in using the connection between tree automaton theory and the weak MSO theory of trees so as to perform selections of certain sets of analyses.