Augmented Lagrangian method

Augmented Lagrangian methods are a certain class of algorithms for solving constrained optimization problems.

They have similarities to penalty methods in that they replace a constrained optimization problem by a series of unconstrained problems and add a penalty term to the objective, but the augmented Lagrangian method adds yet another term designed to mimic a Lagrange multiplier.

The augmented Lagrangian is related to, but not identical with, the method of Lagrange multipliers.

Viewed differently, the unconstrained objective is the Lagrangian of the constrained problem, with an additional penalty term (the augmentation).

The method was also studied by Dimitri Bertsekas, notably in his 1982 book,[3] together with extensions involving non-quadratic regularization functions (e.g., entropic regularization).

This combined study gives rise to the "exponential method of multipliers" which handles inequality constraints with a twice-differentiable augmented Lagrangian function.

Since the 1970s, sequential quadratic programming (SQP) and interior point methods (IPM) have been given more attention, in part because they more easily use sparse matrix subroutines from numerical software libraries, and in part because IPMs possess proven complexity results via the theory of self-concordant functions.

The augmented Lagrangian method was rejuvenated by the optimization systems LANCELOT, ALGENCAN[4][5] and AMPL, which allowed sparse matrix techniques to be used on seemingly dense but "partially-separable" problems.

[6] Around 2007, there was a resurgence of augmented Lagrangian methods in fields such as total variation denoising and compressed sensing.

Consider solving the following constrained optimization problem: subject to where

For reference, we first list the kth step of the penalty method approach: The penalty method solves this problem, then at the next iteration it re-solves the problem using a larger value of

The augmented Lagrangian method uses the following unconstrained objective: and after each iteration, in addition to updating

[6] Nevertheless, it is common in practical implementations to project multipliers estimates in a large bounded set (safeguards) which avoids numerical instabilities and leads to strong theoretical convergence.

[6][5] The alternating direction method of multipliers (ADMM) is a variant of the augmented Lagrangian scheme that uses partial updates for the dual variables.

This method is often applied to solve problems such as, This is equivalent to the constrained problem, Though this change may seem trivial, the problem can now be attacked using methods of constrained optimization (in particular, the augmented Lagrangian method), and the objective function is separable in x and y.

Rather than iterate this process until convergence (like the Jacobi method), the ADMM algorithm proceeds directly to updating the dual variable and then repeats the process.

This is not equivalent to the exact minimization, but the method still converges to the correct solution under some assumptions.

[7] There are several modern software packages, including YALL1[8] (2009), SpaRSA[9] (2009) and SALSA[10] (2009), which solve Basis pursuit and variants and use the ADMM.

There are also packages which use the ADMM to solve more general problems, some of which can exploit multiple computing cores (e.g., SNAPVX[11] (2015), parADMM[12] (2016)).

The goal is to have an estimate of the optimal parameter (minimizer) per new sample.

In a stochastic setting, only noisy samples of a gradient are accessible, so an inexact approximation of the Lagrangian is used:

[14][15][16][17] Regularized optimization problems are especially relevant in the high-dimensional regime as regularization is a natural mechanism to overcome ill-posedness and to encourage parsimony in the optimal solution (e.g., sparsity and low rank).

[citation needed] Open source and non-free/commercial implementations of the augmented Lagrangian method:

Graph of a strictly concave quadratic function with unique maximum.
Optimization computes maxima and minima.