Keywords : natural language processing, segmentation, computational morphology, finite state technology, functional programming.
The Zen toolkit
Participant : Gérard Huet [ correspondant ] .
This software has been devopped by Gérard Huet for many years, initally in the project-team Cristal and it is clearly the most significant software presented in Signes .
It is a generic toolkit extracted by Gérard Huet from his Sanskrit modeling platform allowing the construction of lexicons, the computation of morphological derivatives and flexed forms, and the segmentation analysis of phonetic streams modulo euphony. This little library of finite state automata and transducers, called Zen for its simplicity, was implemented in an applicative kernel of Objective Caml, called Pidgin ML. A literate programming style of documentation, using the program annotation tool Ocamlweb of Jean-Christophe Filliâtre, is available for Ocaml. The Zen toolkit is distributed as free software (under the GPL licence) in the Objective Caml Hump site. This development forms a significant symbolic manipulation software package within pure functional programming, which shows the faisability of developing in the Ocaml system symbolic applications having good time and space performance, within a purely applicative methodology.
A number of uses of this platform outside of the Cristal team are under way. For instance, a lexicon of french flexed forms has been implemented by Nicolas Barth and Sylvain Pogodalla, in the Calligramme project-team at Loria. It is also used by Talana (University of Paris 7).
The algorithmic principles of the Zen library, based on the linear contexts datastructure (`zippers') and on the sharing functor (associative memory server), were presented as an invited lecture at the symposium Practical Aspects of Declarative Languages (PADL), New Orleans, Jan. 2003  . An extended version was written as a chapter of the book ``Thirty Five Years of Automating Mathematics'', edited in honor of N. de Bruijn  .