Inria / Raweb 2004
Team: Mathfi

Search in Activity Report, year 2004:


Team : mathfi

Section: New Results

Keywords: policy iteration algorithm, Howard algorithm, dynamic programming, nonexpansive maps.

Policy iteration algorithms for fixed point problems with nonexpansive operators

Participants: J.Ph. Chancelier, M. Messaoud, A. Sulem.

The case of Bellman equations associated to optimal control of Markov chains on an infinite horizon with discount factor $ \lambda$>0 has been studied for a long time by many authors (see e.g the monographs by Bertsekas and Puterman). Typically these equations are of the form Im21 ${v=sup_{w\#8712 \#119986 }{\mfrac 1{1+\#955 }M^wv+c^w}}$ where Mw is the transition matrix of the Markov chain, cw is the running utility and w is the control variable with values in some control set Im22 $\#119986 $. We know that the iteration policy algorithm converges to the solution of the Bellman equation since the operator Im23 ${\mfrac 1{1+\#955 }M^w+c^w}$ is contractive and satisfies a discrete maximum principle.

The problem addressed here concerns more general fixed point problems on a finite state space. Typically the operator we consider is the maximum of a contractive operator and a nonexpansive one which satisfy some appropriate properties. Shortest path problems also lead to some fixed point problems with nonexpansive operators but in a rather different context whereas reflecting boundaries lead to nonexpansive operators on the boundary. This last problem appears to be a special case of ours. We prove the convergence of an iteration policy algorithm and give illustrating examples in optimal control of Markov chains in [31]

These results can be applied to the numerical analysis of quasi variational inequalities (QVIs) associated to combined impulse/stochastic optimal controls.