Section: New Results
Finite-State Multi-Tape Transducers
Participant : François Barthélemy.
François Barthélemy has been working in the definition of finite-state multi-tape transducers using typed Cartesian Product. Tapes are identified using a unique name and the Cartesian Product is an operator which allows the combination of several components which are either a language on a given tape or an embedded Cartesian Product on several tapes. The components of a Cartesian Product must be independent, namely they do not share any tape. The types are implemented in tapes using auxiliary symbols which are used to obtain a closure under intersection (and also difference and complementation) of the transducers.
François Barthélemy developped a system called Karamel devoted to the development and execution of finite-state multi-tape transducers. The system comprises a language and a Integrated Development Environment. The language uses three ways for defining finite state machines:
-
regular expressions extended with typed Cartesian product
-
operators applied to previously defined machines. These operators are the usual rational operators and extensions, but also intersection, complementation and difference which are in general not internal operations on rational transducers. They are however for the subclass of transducers used in Karamel. There are also two special operations which respectively recognize and extract an untyped language on a given tape of a typed description.
-
contextual rules called Generalized Restriction rules by Yli-Jyrä and Koskenniemi [121] . They are a powerful and abstract mean to express constraints.
The IDE is written in HTML/CSS/Javascript. It provides some basic edition functions, some test facilities and an interface to execute the descriptions. Karamel uses a C++ library from AT&T called FSM which implements efficiently finite-state algorithms. Karamel implements an original unit test framework inspired from the JUnit framework for Java [17] . Tests of finite-state transducers are performed using assertions, namely evaluable boolean predicates. Tests may involve auxiliary finite-state machines called fixtures (e.g.: a given input to a transducer and the corresponding expected output are fixtures). At the moment, Karamel is still a prototype. We plan to complete its development and begin to distribute it in the near future.
The relevance of multi-tape transducers for Natural Language Processing has been exemplified in a case study in Semitic Morphology: a comprehensive verbal grammar of the Akkadian language has been written using Karamel [18] .