Section: New Results
Keywords : Statistical learning theory, grammatical inference, probabilistic automata, rational languages.
Probabilistic automata inference
Participant : François Denis.
In Probabilistic Grammatical Inference, it is supposed that learning data consist in a sequence of words over a finite alphabet drawn according to a fixed but unknown probability distribution Pcalled a stochastic language . Then, the goal is to find a model, which can be a probabilistic automata (PA) or a Hidden Markov Model (HMM) for instance, consistent with the data. Hidden Markov Models and Probabilistic Automata have the same expressivity and their relationship have been precisely studied in  . With Yann Esposito, from the "Laboratoire d'informatique fondamentale de Marseille" (LIF), we have proved in  that stochastic languages pgenerated by probabilistic automata Adepend continuously on the parameters of A, for the norm. As a corollary, we prove that probabilistic automata can be identified in the limit and that the identification is exact when the parameters of the target are rational numbers. However, this result is theoretical and does not lead to a practical learning algorithm. The main difficulty is to infer an appropriate structure from the data: this is possible when natural components of the model correspond to intrinsic components of the target language. We defined the notions of residual languages of a stochastic language and Probabilistic Residual Automata . A PRA is a PA whose states directly correspond to the residual of the language it generates. When the target stochastic language can be generated by a PRA, an efficient learning algorithm can be defined (see  ). Stochastic languages defined from probabilistic automata are rational languages and we feel necessary to study Rational Stochastic Languages from a Language Theoretical point of view. Main results have been described in  . A main publication is in preparation.