Inria / Raweb 2004
Project-Team: MODBIO

Search in Activity Report, year 2004:


Project-Team : modbio

Section: New Results

Keywords: Statistical learning theory, stochastic languages, probabilistic automata, grammatical inference.

Probabilistic automata inference

Participant: François Denis.

Multiplicity automata (or rational series) are formal objects which can model stochastic languages, i.e., probability distributions over words. They can be represented by a structure which is a finite automaton and by continuous parameters associated with states and transitions. Given a structure A and a sample S independently distributed according to a probability distribution P, computing parameters for A which maximize the likelihood of the observation is NP-hard, but efficient algorithms can be used in practical cases. On the other hand, inferring both structure and parameters from a sample is a widely open field of research, that is studied with Yann Esposito: he is currently achieving a PhD on this subject. We have proved that the set of stochastic languages generated from Im5 $\#8474 $-rational series is not recursively enumerable and hence, seems not suitable for grammatical inference purpose. However, we showed that the set of stochastic languages generated from Im6 $\#8477 _+$-rational series (PA) can be uniformly identified in the limit with probability one, provided that a structure which fits the sample according to Im7 ${{||·|}|_\#8734 }$ norm can be found [18]. This problem is likely to be computationally difficult for the general class. We introduce a natural subclass of Im6 $\#8477 _+$-rational series (PRA), which define a class of stochastic languages having an intrinsic characterization by means of their residual languages and for which efficient inference algorithms can be designed [33][34], see also [13].