Team Commands

Overall Objectives
Scientific Foundations
Application Domains
New Results
Contracts and Grants with Industry
Other Grants and Activities

Section: Scientific Foundations

Stochastic optimal control

Optimal stochastic control problems occur when the dynamical system is uncertain. A decision typically has to be taken at each time, while realizations of future events are unknown (but some information is given on their distribution of probabilities). In particular, problems of economic nature deal with large uncertainties (on prices, production and demand). Specific examples are the portfolio selection problems in a market with risky and non-risky assets, super-replication with uncertain volatility, management of power resources (dams, gas). Air traffic control is another example of such problems.

Stochastic programming

By stochastic programming we mean stochastic optimal control in a discrete time (or even static) setting; see the overview by Ruszczynski and Shapiro [37] . The static and single recourse cases are essentially well-understood; by contrast the truly dynamic case (multiple recourse) presents an essential difficulty Shapiro [164] , Shapiro and Nemirovski [163] . So we will speak only of the latter, assuming decisions to be measurable w.r.t. a certain filtration (in other words, all information from the past can be used).

Dynamic programming.

In the standard case of minimization of an expectation (possibly of a utility function) a dynamic programming principle holds. Essentially, this says that the decision is a function of the present state (we can ignore the past) and that a certain reverse-time induction over the associated values holds. Unfortunately a straighforward resolution of the dynamic programming principle based on a discretization of the state space is out of reach (again this is the curse of dimensionality). For convex problems one can build lower convex approximations of the value function: this is the Stochastic dual dynamic programming (SDDP) approach, Pereira and Pinto [150] . Another possibility is a parametric approximation of the value function ; however determining the basis functions is not easy and identifying (or, we could say in this context, learning) the best parameters is a nonconvex problem, see however Bertsekas and J. Tsitsiklis [57] , Munos [146] .

Tree based algorithms.

A popular approach is to sample the uncertainties in a structured way of a tree (branching occurs typically at each time). Computational limits allow only a small number of branching, far less than the amount needed for an accurate solution Shapiro and Nemirovski [163] . Such a poor accuracy may nevertheless (in the absence of a more powerful approach) be a good way for obtaining a reasonable policy. Very often the resulting programs are linear, possibly with integer variables (on-off switches of plants, investment decisions), allowing to use (possibly dedicated) mixed integer linear programming codes. The tree structure (coupling variables) can be exploited by the numerical algorithms, see Dantzig and Wolfe [99] , Kall and Wallace [128] .

Monte Carlo approaches

By Monte Carlo we mean here sampling a given number of independent trajectories (of uncertainties). In the special case of optimal stopping (e.g., American options) it happens that the state space and the uncertainty space coincide. Then one can compute the transition probabilities of a Markov chain whose law approaches the original one, and then the problem reduces to the one of a Markov chain, see [89] . Let us mention also the quantization approach, see [43] .

In the general case a useful possibility is to compute a tree by agregating the original sample, as done in [107] .

Controlling risk

Maximizing the expectation of gains can lead to a solution with a too high probability of important losses (bankruptcy). In view of this it is wise to make a compromise between expected gains and risk of high losses. A simple and efficient way to achieve that may be to maximize the expectation of a utility function; this, however, needs an ad-hoc tuning. An alternative is the mean-variance compromise, presented in the case of portfolio optimization in Markowitz [143] . A useful generalization of the variance, uncluding dissymetric functions such as semideviations, is the theory of deviation measures, Rockafellar et al. [158] .

Risk measures

Another possibility is to put a constraint on the level of gain to be obtained with a high probability value say at least 99%. The corresponding concept of value-at-risk leads to difficult nonconvex optimization problems, although convex relaxations may be derived, see Shapiro and Nemirovski [147] .

Yet the most important contribution of the recent years is the axiomatized theory of risk measures Artzner et al. [41] , satisfying the properties of monotonicity and possibly convexity.

In a dynamic setting, risk measures (over the total gains) are not coherent (they do not obey a dynamic programming principle). The theory of coherent risk measures is an answer in which risk measures over successive time steps are inductively applied; see Ruszczyński and Shapiro [162] . Their drawback is to have no clear economic interpretation at the moment. Also, associated numerical methods still have to be developed.

Links with robust optimization

The study of relations between chance constraints (constraints on the probability of some event) and robust optimization is the subject of intense research. The idea is, roughly speaking, to solve a robust optimization (some classes of which are tractable in the sense of algorithmic complexity). See the recent work by Ben-Tal and Teboulle [50] .

Continuous-time stochastic optimal control

The case of continuous-time can be handled with the Bellman dynamic programming principle, which leads to obtain a characterization of the value function as solution of a second order Hamilton-Jacobi-Bellman equation [98] , [96] .

Theoretical framework

Sometimes this value function is smooth (e.g. in the case of Merton's portfolio problem, Oksendal [168] ) and the associated HJB equation can be solved explicitly. Still, the value function is not smooth enough to satisfy the HJB equation in the classical sense. As for the deterministic case, the notion of viscosity solution provides a convenient framework for dealing with the lack of smoothness, see Pham [153] , that happens also to be well adapted to the study of discretization errors for numerical discretization schemes [133] , [47] .

Numerical approximation

The numerical discretization of second order HJB equations was the subject of several contributions. The book of Kushner-Dupuis [134] gives a complete synthesis on the chain Markov schemes (i.e Finite Differences, semi-Lagrangian, Finite Elements, ...). Here a main difficulty of these equations comes from the fact that the second order operator (i.e. the diffusion term) is not uniformly elliptic and can be degenerated. Moreover, the diffusion term (covariance matrix) may change direction at any space point and at any time (this matrix is associated the dynamics volatility).

Our past contributions on stochastic optimal control

In the framework of the thesis of R. Apparigliato (that will finish at the end of 2007) we have studied the robust optimization approach to stochastic programming problems, in the case of hydroelectric production, for one valley. The main difficulty lies with both the dynamic character of the system and the large number of constraints (capacity of each dam). We have also studied the simplified electricity production models for respecting the “margin” constraint. In the framework of the thesis of G. Emiel and in collaboration with CEPEL, we have studied large-scale bundle algorithms for solving (through a dual “price decomposition” method) stochastic problems for the Brazilian case.

For solving stochastic control problems, we studied the so-called Generalized Finite Differences (GFD), that allow to choose at any node, the stencil approximating the diffusion matrix up to a certain threshold [85] . Determining the stencil and the associated coefficients boils down to a quadratic program to be solved at each point of the grid, and for each control. This is definitely expensive, with the exception of special structures where the coefficients can be computed at low cost. For two dimensional systems, we designed a (very) fast algorithm for computing the coefficients of the GFD scheme, based on the Stern-Brocot tree [82] . The GFD scheme was used as a basis for the approximation of an HJB equation coming from a super-replication problem. The problem was motivated by a study conducted in collaboration with Société Générale, see [63] .

Within the framework of the thesis of Stefania Maroso, we also contributed to the study of the error estimate of the approximation of Isaac equation associated to a differential game with one player [80] , and also for the approximation of HJB equation associated with the impulse problem [81] .


Logo Inria