## Project-Team : modbio

## Section: New Results

**Keywords: ***Statistical learning theory*, *stochastic languages*, *probabilistic automata*, *grammatical inference*.

## Probabilistic automata inference

**Participant:**François Denis.

Multiplicity automata (or rational series) are formal objects which
can model *stochastic languages*, i.e., probability distributions
over words. They can be represented by a *structure* which is a
finite automaton and by *continuous parameters* associated with
states and transitions. Given a structure A and a sample S
independently distributed according to a probability distribution P,
computing parameters for A which maximize the likelihood of the
observation is NP-hard, but efficient algorithms can be used in
practical cases. On the other hand, inferring both structure and
parameters from a sample is a widely open field of research, that is
studied with Yann Esposito: he is currently achieving a PhD on this
subject. We have proved that the set of stochastic languages generated
from -rational series is not recursively enumerable and
hence, seems not suitable for grammatical inference purpose. However,
we showed that the set of stochastic languages generated from
-rational series (PA) can be uniformly identified in
the limit with probability one, provided that a
structure which fits the sample according to
norm can be found [18]. This problem is likely to be computationally difficult for the
general class. We introduce a natural subclass of -rational series (PRA), which define a class of stochastic languages
having an intrinsic characterization by means of their residual
languages and for which efficient inference algorithms can be
designed [33][34], see
also [13].