## Section: New Results

Keywords : Statistical learning theory, grammatical inference, probabilistic automata, rational languages.

### Probabilistic automata inference

Participant : François Denis.

In Probabilistic Grammatical Inference, it is supposed that learning data consist in a sequence of words over a finite alphabet
drawn according to a fixed but unknown probability distribution
Pcalled a
*stochastic language* . Then, the goal is to find a model, which can be a probabilistic automata (PA) or a Hidden Markov Model (HMM) for instance, consistent with the data. Hidden Markov Models and Probabilistic Automata have the same expressivity and their relationship have been
precisely studied in
[17] . With Yann Esposito, from the "Laboratoire d'informatique fondamentale de Marseille" (LIF), we have proved in
[25] that stochastic languages
pgenerated by probabilistic automata
Adepend continuously on the parameters of
A, for the
norm. As a corollary, we prove that probabilistic automata can be identified in the limit and that the identification is exact when the parameters of the target are rational numbers. However, this result is theoretical and does not lead to a practical learning algorithm. The main
difficulty is to infer an appropriate structure from the data: this is possible when natural components of the model correspond to intrinsic components of the target language. We defined the notions of
*residual languages* of a stochastic language and
*Probabilistic Residual Automata* . A PRA is a PA whose states directly correspond to the residual of the language it generates. When the target stochastic language can be generated by a PRA, an efficient learning algorithm can be defined (see
[25] ). Stochastic languages defined from probabilistic automata are rational languages and we feel necessary to study
Rational Stochastic Languages from a Language Theoretical point of view. Main results have been described in
[26] . A main publication is in preparation.