Section: Scientific Foundations
Deterministic optimal control
For deterministic optimal control we will distinguish two approaches, trajectory optimization , in which the object under consideration is a single trajectory, and the Hamilton-Jacobi-Bellman approach, based on dynamic principle, in which a family of optimal control problems is solved.
The roots of deterministic optimal control are the “classical” theory of the calculus of variations, illustrated by the work of Newton, Bernoulli, Euler, and Lagrange (whose famous multipliers were introduced in  ), with improvements due to the “Chicago school”, Bliss  during the first part of the 20th century, and by the notion of relaxed problem and generalized solution (Young  ).
Trajectory optimization really started with the spectacular achievement done by Pontryagin's group  during the fifties, by stating, for general optimal control problems, nonlocal optimality conditions generalizing those of Weierstrass. This motivated the application to many industrial problems (see the classical books by Bryson and Ho  , Leitmann  , Lee and Markus  , Ioffe and Tihomirov  ). Since then, various theoretical achievements have been obtained by extending the results to nonsmooth problems, see Aubin  , Clarke  , Ekeland  . Substantial improvements were also obtained by using tools of differential geometry, which concern a precise understanding of optimal syntheses in low dimension for large classes of nonlinear control systems, see Bonnard, Faubourg and Trélat  .
Overviews of numerical methods for trajectory optimization are provided in Pesch  , Betts  . We follow here the classical presentation that distinguishes between direct and indirect methods.
Dynamic programming was introduced and systematically studied by R. Bellman during the fifties. The HJB equation, whose solution is the value function of the (parameterized) optimal control problem, is a variant of the classical Hamilton-Jacobi equation of mechanics for the case of dynamics parameterized by a control variable. It may be viewed as a differential form of the dynamic programming principle. This nonlinear first-order PDE appears to be well-posed in the framework of viscosity solutions introduced by Crandall and Lions  ,  ,  . These tools also allow to perform the numerical analysis of discretization schemes. The theoretical contributions in this direction did not cease growing, see the books by Barles  and Bardi and Capuzzo-Dolcetta  .
A interesting by-product of the HJB approach is an expression of the optimal control in feedback form. Also it reaches the global optimum, whereas trajectory optimization algorithms are of local nature. A major difficulty when solving the HJB equation is the high cost for a large dimension n of the state (complexity is exponential with respect to n ).
Direct methods for trajectory optimization
The so-called direct methods consist in an optimization of the trajectory, after having discretized time, by a nonlinear programming solver that possibly takes into account the dynamic structure. So the two main problems are the choice of the discretization and the nonlinear programming algorithm. A third problem is the possibility of refinement of the discretization once after solving on a coarser grid.
Rough control discretization.
Many authors prefer to have a coarse discretization for the control variables (typically constant or piecewise-linear on each time step) and a higher order discretization for the state equation. The idea is both to have an accurate discretization of dynamics (since otherwise the numerical solution may be meaningless) and to obtain a small-scale resulting nonlinear programming problem. See e.g. Kraft  . A typical situation is when a few dozen of time-steps are enough and there are no more than five controls, so that the resulting NLP has at most a few hundreds of unknowns and can be solved using full matrices software. On the other hand, the error order (assuming the problem to be unconstrained) is governed by the (poor) control discretization. Note that the integration scheme does not need to be specified (provided it allows to compute functions and gradients with enough precision) and hence general Ordinary Differential Equations integrators may be used.
On the other hand, a full discretization (i.e. in a context of Runge-Kutta methods, with different values of control for each inner substep of the scheme) allows to obtain higher orders that can be effectively computed, see Hager  , Bonnans  , being related to the theory of partitioned Runge-Kutta schemes, Hairer et al.  . In an interior-point algorithm context, controls can be eliminated and the resulting system of equation is easily solved due to its band structure. Discretization errors due to constraints are discussed in Dontchev et al.  . See also Malanowski et al.  .
For large horizon problems integrating from the initial time to the final time may be impossible (finding a feasible point can be very hard !). Analogously to the indirect method of multiple shooting algorithm, a possibility is to add (to the control variables), as optimization parameters, the state variables for a given set of times, subject of course to “sticking” constraint. Note that once more the integration scheme does not need to be specified. Integration of the ODE can be performed in parallel. See Bock  .
Parametrization by output variables.
Recent proposals were made of methods based on a reformulation of the problem based on (possibly flat) output variables . By the definition, control and state variables are combinations of derivatives of these output variables. When the latter are represented on a basis of smooth functions such as polynomials, their derivatives are given linear combinations of the coefficients, and so the need for integration is avoided. One must of course take care of the possibly complicated expression of constraints that can make numerical difficulties. The numerical analysis of these methods seems largely open. See on this subject Petit, Milam and Murray  .
Collocation and pseudospectral methods.
The collocation approach for solving an ODE consists in a polynomial interpolation of the dynamic variable, the dynamic equation being enforced only at limited number of points (equal to the degree of the polynomial). Collocation can also be performed on each time step of a one-step method; it can be checked than collocation methods are a particular case of Runge-Kutta methods.
It is known that the polynomial interpolation with equidistant points is unstable for more than about 20 points, and that the Tchebycheff points should be preferred, see e.g.  . Nevertheless, several papers suggested the use of pseudospectral methods Ross and Fahroo  in which a single (over time) high-order polynomial approximation is used for the control and state. Therefore pseudospectral methods should not be used in the case of nonsmooth (e.g. discontinuous) control.
In view of model and data uncertainties there is a need for robust solutions. Robust optimization has been a subject of increasing importance in recent years see Ben-Tal and Nemirovski  . For dynamic problems taking the worst-case of the perturbation at each time-step may be too conservative. Specific remedies have been proposed in specific contexts, see Ben-Tal et al.  , Diehl and Björnberg  .
A relatively simple method taking into account robustness, applicable to optimal control problems, was proposed in Diehl, Bock and Kostina  .
Nonlinear programming algorithms.
The dominant ones (for optimal control problems as well as for other fields) have been successively the augmented Lagrangian approach (1969, due to Hestenes  and Powell  , see also Bertsekas  ) successive quadratic programming (SQP: late seventies, due to  ,  , and interior-point algorithms since 1984, Karmarkar  . See the general textbooks on nonlinear programming  ,  ,  .
When ordered by time the optimality system has a “band structure”. One can take easily advantage of this with interior-point algorithms whereas it is not so easy for SQP methods; see Berend et al.  . There exist some very reliable SQP softwares SNOPT, some of them dedicated to optimal control problems, Betts  , as well as robust interior-point software, see Morales et al.  , Wächter and Biegler  , and for application to optimal control Jockenhövel et al.  .
Our past contributions in direct methods
We have developed a general SQP algorithm  ,  for sparse nonlinear programming problems, and the associated software for optimal control problems; it has been applied to atmospheric reentry problems, in collaboration with CNES  .
More recently, in collaboration with CNES and ONERA, we have developed a sparse interior-point algorithm with an embedded refinement procedure. The resulting TOPAZE code has been applied to various space trajectory problems  ,  ,  . The method takes advantage of the analysis of discretization errors, is well-understood for unconstrained problems  .
Indirect approach for trajectory optimization
The indirect approach eliminates control variables using Pontryagin's maximum principle, and solves the two-points boundary value problem (with differential variables state and costate) by a single or multiple shooting method. The questions are here the choice of a discretization scheme for the integration of the boundary value problem, of a (possibly globalized) Newton type algorithm for solving the resulting finite dimensional problem in IRn (n is the number of state variables), and a methodology for finding an initial point.
The choice of the discretization scheme for the numerical integration of the boundary value problem can have a great impact on the convergence of the method. First, the integration itself can be tricky. If the state equation is stiff (the linearized system has fast modes) then the state-costate has both fast and unstable modes. Also, discontinuities of the control or its derivative, due to commutations or changes in the set of active constraints, lead to the use of sophisticated variable step integrators and/or switching detection mechanisms, see Hairer et al.  ,  . Another point is the computation of gradients for the Newton method, for which basic finite differences can give inaccurate results with variable step integrators (see Bock  ). This difficulty can be treated in several ways, such as the so-called “internal differentiation” or the use of variational equations, see Gergaud and Martinon  .
Specific formulations for constrained problems and singular arcs
Most optimal control problems include control and state constraints. In that case the formulation of the TPBVP must take into account entry and exit times of boundary arcs for these constraints, and (for state constraints of order at least two) times of touch points (isolated contact points). In addition for state constrained problems, the so-called “alternative formulation” (that allows to eliminate the “algebraic variables, i.e. control and state, from the algebraic constraints) has to be used, see Hartl, Sethi and Vickson  .
Another interesting point is the presence of singular arcs, appearing for instance when the control enters in the system dynamics and cost function in a linear way, which is common in practical applications. As for state contraints, the formulation of the boundary value problem must take into account these singular arcs, over which the expression of the optimal control typically involves higher derivatives of the Hamiltonian, see Goh  and Robbins  .
As mentioned before, finding a suitable initial point can be extremely difficult for indirect methods, due to the small convergence radius of the Newton type method used to solve the boundary value problem. Homotopy methods are an effective way to address this issue, starting from the solution of an easier problem to obtain a solution of the target problem (see Allgower and Georg  ). It is sometimes possible to combine the homotopy approach with the Newton method used for the shooting, see Deuflhard  .
Other related approaches
With a given value of the initial costate are associated (through in integration of the reduced state-costate system) a control and a state, and therefore a cost function. The latter can therefore be minimized by ad-hoc minimization algorithms, see Dixon and Bartholomew-Biggs  . The advantage of this point of view is the possibility to use the various descent methods in order to avoid convergence to a local maximum or saddle-point. The extension of this approach to constrained problems (especially in the case of state constraints) is an open and difficult question.
Our past contributions in indirect methods
We have recently clarified under which properties shooting algorithms are well-posed in the presence of state constraints. The (difficult) analysis was carried out in  ,  . A related homotopy algorithm, restricted to the case of a single first-order state constraint, has been proposed in  .
We also conducted a study of optimal trajectories with singular arcs for space launcher problems. The results obtained for the generalized three-dimensional Goddard problem (see  ) have been successfully adapted for the numerical solution of a realistic launcher model (Ariane 5 class).
Furthermore, we continue to investigate the effects of the numerical integration of the boundary value problem and the accurate computation of Jacobians on the convergence of the shooting method. As initiated in  , we focus more specifically on the handling of discontinuities, with ongoing work on the geometric integration aspects (Hamiltonian conservation).
Geometric approaches succeeded in giving a precise description of the structure of optimal trajectories, as well as clarifying related questions. For instance, there have been many works aiming to describe geometrically the set of attainable points, by many authors (Krener, Schättler, Bressan, Sussmann, Bonnard, Kupka, Ekeland, Agrachev, Sigalotti, etc). It has been proved, in particular, by Krener and Schättler  that, for generic single-input control-affine systems in IR3 , , where the control satisfies the constraint |u|1 , the boundary of the accessible set in small time consists of the surfaces generated by the trajectories x + x- and x-x + , where x + (resp. x- ) is an arc corresponding to the control u = 1 (resp. u = -1 ); moreover, every point inside the accessible set can be reached with a trajectory of the form x-x + x- or x + x-x + . It follows that minimal time trajectories of generic single-input control-affine systems in IR3 are locally of the form x-x + x- or x + x-x + , i.e., are bang-bang with at most two switchings.
This kind of result has been slightly improved recently by Agrachev-Sigalotti, although they do not take into account possible state constraints.
Our past contributions.
In  , we have extended that kind of result to the case of state constraints: we described a complete classification, in terms of the geometry (Lie configuration) of the system, of local minimal time syntheses, in dimension two and three. This theoretical study was motivated by the problem of atmospheric re-entry posed by the CNES, and in  , we showed how to apply this theory to this concrete problem, thus obtaining the precise structure of the optimal trajectory.
This approach consists in calculating the value function associated with the optimal control problem, and then synthesizing the feedback control and the optimal trajectory using Pontryagin's principle. The method has the great particular advantage of reaching directly the global optimum, which can be very interesting, when the problem is not convex.
Characterization of the value function
From the dynamic programming principle, we derive a characterization of the value function as being a solution (in viscosity sense) of an Hamilton-Jacobi-Bellman equation, wich is a nonlinear PDE of dimension equal to the number n of state variables. Since the pioneer works of Crandall and Lions  ,  ,  , many theoretical contributions were carried out, allowing an understanding of the properties of the value function as well as of the set of admissible trajectories. However, there remains an important effort to provide for the development of effective and adapted numerical tools, mainly because of numerical complexity (complexity is exponential with respect to n).
Numerical approximation for continuous value function
Several numerical schemes have been already studied to treat the case when the solution of the HJB equation (the value function) is continuous. Let us quote for example the Semi-Lagrangian methods  ,  studied by the team of M. Falcone (La Sapienza, Rome), the high order schemes WENO, ENO, Discrete galerkin introduced by S. Osher, C.-W. Shu, E. Harten  ,  ,  ,  , and also the schemes on nonregular grids by R. Abgrall  ,  . All these schemes rely on finite differences or/and interpolation techniques which lead to numerical diffusions. Hence, the numerical solution is unsatisfying for long time approximations even in the continuous case.
In a realistic optimal control problem, there are often constraints on the state (reaching a target, restricting the state of the system in an acceptable domain, ...). When some controlability assumptions are not satisfied, the value function associated to such a problem is discontinuous and the region of discontinuity is of great importance since it separates the zone of admissible trajectories and the nonadmissible zone.
In this case, it is not reasonable to use the usual numerical schemes (based on finite differences) for solving the HJB equation. Indeed, these schemes provide poor approximation quality because of the numerical diffusion.
Our past contributions on the HJB approach
Discrete approximations of the Hamilton-Jacobi equation for an optimal control problem of a differential-algebraic system were studied in  .
Numerical methods for the HJB equation in a bilevel optimization scheme where the upper-level variables are design parameters were used in  . The algorithm has been applied to the parametric optimization of hybrid car engines.
Within the framework of the thesis of N. Megdich, we have studied new antidiffusive schemes for HJB equations with discontinuous data  ,  . One of these schemes is based on the Ultrabee algorithm proposed, in the case of advection equation with constant velocity, by Roe  and recently revisited by Després-Lagoutière  ,  . The numerical results on several academic problems show the relevance of the antidiffusive schemes. However, the theoretical study of the convergence is a difficult question and is only partially done.