Hamilton–Jacobi–Bellman equation

The Hamilton-Jacobi-Bellman (HJB) equation is a nonlinear partial differential equation that provides necessary and sufficient conditions for optimality of a control with respect to a loss function.

[1] Its solution is the value function of the optimal control problem which, once known, can be used to obtain the optimal control by taking the maximizer (or minimizer) of the Hamiltonian involved in the HJB equation.

[2][3] The equation is a result of the theory of dynamic programming which was pioneered in the 1950s by Richard Bellman and coworkers.

[4][5][6] The connection to the Hamilton–Jacobi equation from classical physics was first drawn by Rudolf Kálmán.

[9] A major drawback, however, is that the HJB equation admits classical solutions only for a sufficiently smooth value function, which is not guaranteed in most situations.

Instead, the notion of a viscosity solution is required, in which conventional derivatives are replaced by (set-valued) subderivatives.

[10] Consider the following problem in deterministic optimal control over the time period

For this simple system, the Hamilton–Jacobi–Bellman partial differential equation is subject to the terminal condition As before, the unknown scalar function

in the above partial differential equation is the Bellman value function, which represents the cost incurred from starting in state

is the optimal cost-to-go function (also called the 'value function'), then by Richard Bellman's principle of optimality, going from time t to t + dt, we have Note that the Taylor expansion of the first term on the right-hand side is where

denotes the terms in the Taylor expansion of higher order than one in little-o notation.

The HJB equation is usually solved backwards in time, starting from

is continuously differentiable, the HJB equation is a necessary and sufficient condition for an optimum when the terminal state is unconstrained.

In general case, the HJB equation does not have a classical (smooth) solution.

Several notions of generalized solutions have been developed to cover such situations, including viscosity solution (Pierre-Louis Lions and Michael Crandall),[13] minimax solution (Andrei Izmailovich Subbotin [ru]), and others.

Approximate dynamic programming has been introduced by D. P. Bertsekas and J. N. Tsitsiklis with the use of artificial neural networks (multilayer perceptrons) for approximating the Bellman function in general.

[14] This is an effective mitigation strategy for reducing the impact of dimensionality by replacing the memorization of the complete function mapping for the whole space domain with the memorization of the sole neural network parameters.

In particular, for continuous-time systems, an approximate dynamic programming approach that combines both policy iterations with neural networks was introduced.

[15] In discrete-time, an approach to solve the HJB equation combining value iterations and neural networks was introduced.

[16] Alternatively, it has been shown that sum-of-squares optimization can yield an approximate polynomial solution to the Hamilton–Jacobi–Bellman equation arbitrarily well with respect to the

[17] The idea of solving a control problem by applying Bellman's principle of optimality and then working out backwards in time an optimizing strategy can be generalized to stochastic control problems.

with Itô's rule, one finds the stochastic HJB equation where

represents the stochastic differentiation operator, and subject to the terminal condition Note that the randomness has disappeared.

of the latter does not necessarily solve the primal problem, it is a candidate only and a further verifying argument is required.

This technique is widely used in Financial Mathematics to determine optimal investment strategies in the market (see for example Merton's portfolio problem).

As an example, we can look at a system with linear stochastic dynamics and quadratic cost.

If the system dynamics is given by and the cost accumulates at rate

, the HJB equation is given by with optimal action given by Assuming a quadratic form for the value function, we obtain the usual Riccati equation for the Hessian of the value function as is usual for Linear-quadratic-Gaussian control.