It is an extension of Newton's method for finding a minimum of a non-linear function.
In this sense, the algorithm is also an effective method for solving overdetermined systems of equations.
The method is named after the mathematicians Carl Friedrich Gauss and Isaac Newton, and first appeared in Gauss's 1809 work Theoria motus corporum coelestium in sectionibus conicis solem ambientum.
The Gauss–Newton algorithm can be derived by linearly approximating the vector of functions ri.
is a linear least-squares problem, which can be solved explicitly, yielding the normal equations in the algorithm.
The plot in the figure on the right shows the curve determined by the model for the optimal parameters with the observed data.
The Gauss-Newton iteration is guaranteed to converge toward a local minimum point
It can be shown[5] that the increment Δ is a descent direction for S, and, if the algorithm converges, then the limit is a stationary point of S. For large minimum value
[7] The algorithm may converge slowly or not at all if the initial guess is far from the minimum or the matrix
, then the problem is in fact linear and the method finds the optimum in one iteration.
If |λ| < 1, then the method converges linearly and the error decreases asymptotically with a factor |λ| at every iteration.
It can be considered an extension of Newton's method and enjoys the same local quadratic convergence [4] toward isolated regular solutions.
reaches a small local minimum, the Gauss-Newton iteration linearly converges to
In what follows, the Gauss–Newton algorithm will be derived from Newton's method for function optimization via an approximation.
As a consequence, the rate of convergence of the Gauss–Newton algorithm can be quadratic under certain regularity conditions.
[9] The recurrence relation for Newton's method for minimizing a function S of parameters
These expressions are substituted into the recurrence relation above to obtain the operational equations
that needs to hold to be able to ignore the second-order derivative terms may be valid in two cases, for which convergence is to be expected:[10] With the Gauss–Newton method the sum of squares of the residuals S may not decrease at every iteration.
In other words, the increment vector is too long, but it still points "downhill", so going just a part of the way will decrease the objective function S. An optimal value for
is determined by finding the value that minimizes S, usually using a direct search method in the interval
[3] The normal equations are modified in such a way that the increment vector is rotated towards the direction of steepest descent,
may also be optimized by a line search, but this is inefficient, as the shift vector must be recalculated every time
A more efficient strategy is this: When divergence occurs, increase the Marquardt parameter until there is a decrease in S. Then retain the value from one iteration to the next, but decrease it if possible until a cut-off value is reached, when the Marquardt parameter can be set to zero; the minimization of S then becomes a standard Gauss–Newton minimization.
For large-scale optimization, the Gauss–Newton method is of special interest because it is often (though certainly not always) true that the matrix
In order to make this kind of approach work, one needs at least an efficient method for computing the product
for some vector p. With sparse matrix storage, it is in general practical to store the rows of
in a compressed form (e.g., without zero entries), making a direct computation of the above product tricky due to the transposition.
In addition to respecting a practical sparse storage structure, this expression is well suited for parallel computations.
Note that quasi-Newton methods can minimize general real-valued functions, whereas Gauss–Newton, Levenberg–Marquardt, etc.
Consequently, it is highly inefficient for many functions, especially if the parameters have strong interactions.