In applied mathematics, Hessian automatic differentiation are techniques based on automatic differentiation (AD) that calculate the second derivative of an
is the second order derivative and is a symmetric matrix.
, this method efficiently calculates the Hessian-vector product
[1] The method works by first using forward AD to perform
, subsequently the method then calculates the gradient of
An algorithm that calculates the entire Hessian with one forward and one reverse sweep of the computational graph is Edge_Pushing.
Naturally, this graph has n output nodes, thus in a sense one has to apply the reverse gradient method to each outgoing node.
[2] The algorithm's input is the computational graph of the function.
After a preceding forward sweep where all intermediate values in the computational graph are calculated, the algorithm initiates a reverse sweep of the graph.
Appended to this nonlinear edge is an edge weight that is the second-order partial derivative of the nonlinear node in relation to its predecessors.
This nonlinear edge is subsequently pushed down to further predecessors in such a way that when it reaches the independent nodes, its edge weight is the second-order partial derivative of the two independent nodes it connects.
[2] The graph colouring techniques explore sparsity patterns of the Hessian matrix and cheap Hessian vector products to obtain the entire matrix.
Thus these techniques are suited for large, sparse matrices.
The general strategy of any such colouring technique is as follows.
When one wants to calculate the Hessian at numerous points (such as in an optimization routine), steps 3 and 4 are repeated.
As an example, the figure on the left shows the sparsity pattern of the Hessian matrix where the columns have been appropriately coloured in such a way to allow columns of the same colour to be merged without incurring in a collision between elements.