Adjoint state method

The adjoint state method is a numerical method for efficiently computing the gradient of a function or operator in a numerical optimization problem.

[1] It has applications in geophysics, seismic imaging, photonics and more recently in neural networks.

[2] The adjoint state space is chosen to simplify the physical interpretation of equation constraints.

[3] Adjoint state techniques allow the use of integration by parts, resulting in a form which explicitly contains the physically interesting quantity.

An adjoint state equation is introduced, including a new unknown variable.

The adjoint method formulates the gradient of a function towards its parameters in a constraint optimization form.

By using the dual form of this constraint optimization problem, it can be used to calculate the gradient very fast.

[5] The name adjoint state method refers to the dual form of the problem, where the adjoint matrix

When the initial problem consists of calculating the product

, the dual problem can be realized as calculating the product

is called the adjoint state vector.

The original adjoint calculation method goes back to Jean Cea,[6] with the use of the Lagrangian of the optimization problem to compute the derivative of a functional with respect to a shape parameter.

(usually the weak form of a partial differential equation), thus the considered objective is

is the solution of the state equation given the optimization variables

is often very hard to differentiate analytically since the dependance is defined through an implicit equation.

The Lagrangian functional can be used as a workaround for this issue.

Since the state equation can be considered as a constraint in the minimization of

, the problem has an associate Lagrangian functional

is a Lagrange multiplier or adjoint state variable and

The method of Lagrange multipliers states that a solution to the problem has to be a stationary point of the lagrangian, namely where

The first equation is the so-called adjoint state equation, because the operator involved is the adjoint operator of

Resolving this equation yields the adjoint state

), thus it can be easily identified by subsequently resolving the direct and adjoint state equations.

is self-adjoint or symmetric since the direct and adjoint state equations differ only by their right-hand side.

In a real finite dimensional linear programming context, the objective function could be

is the dyadic product between the direct and adjoint states and

denotes a double tensor contraction.

has a known analytic expression that can be differentiated easily.

In the goal of never inverting a matrix, which is a very slow process numerically, a LU decomposition can be used instead to solve the state equation, in

That same decomposition can then be used to solve the adjoint state equation in only