The basis of the method is to approximate the model by a linear one and to refine the parameters by successive iterations.
are functions of both the independent variable and the parameters, so in general these gradient equations do not have a closed solution.
At each iteration the model is linearized by approximation to a first-order Taylor polynomial expansion about
The Jacobian matrix, J, is a function of constants, the independent variable and the parameters, so it changes from one iteration to the next.
These equations form the basis for the Gauss–Newton algorithm for a non-linear least squares problem.
Note the sign convention in the definition of the Jacobian matrix in terms of the derivatives.
Each element of the diagonal weight matrix W should, ideally, be equal to the reciprocal of the error variance of the measurement.
In NLLSQ the objective function is quadratic with respect to the parameters only in a region close to its minimum value, where the truncated Taylor series is a good approximation to the model.
It also explains how divergence can come about as the Gauss–Newton algorithm is convergent only when the objective function is approximately quadratic in the parameters.
Some problems of ill-conditioning and divergence can be corrected by finding initial parameter estimates that are near to the optimal values.
The parameters of the model are adjusted by hand until the agreement between observed and calculated data is reasonably good.
Although this will be a subjective judgment, it is sufficient to find a good starting point for the non-linear refinement.
The common sense criterion for convergence is that the sum of squares does not increase from one iteration to the next.
Again, the numerical value is somewhat arbitrary; 0.001 is equivalent to specifying that each parameter should be refined to 0.1% precision.
There are models for which it is either very difficult or even impossible to derive analytical expressions for the elements of the Jacobian.
The normal equations matrix is not positive definite at a maximum in the objective function, as the gradient is zero and no unique direction of descent exists.
For example, when fitting a Lorentzian the normal equations matrix is not positive definite when the half-width of the band is zero.
Such an approximation is, for instance, often applicable in the vicinity of the best estimator, and it is one of the basic assumption in most iterative minimization algorithms.
Another example of a linear approximation would be when the model is a simple exponential function,
This procedure should be avoided unless the errors are multiplicative and log-normally distributed because it can give misleading results.
Therefore, when the transformed sum of squares is minimized, different results will be obtained both for the parameter values and their calculated standard deviations.
However, with multiplicative errors that are log-normally distributed, this procedure gives unbiased and consistent parameter estimates.
While this method may be adequate for simple models, it will fail if divergence occurs.
If divergence occurs, a simple expedient is to reduce the length of the shift vector,
This limits the applicability of the method to situations where the direction of the shift vector is not very different from what it would be if the objective function were approximately quadratic in the parameters,
[7] The minimum in the sum of squares can be found by a method that does not involve forming the normal equations.
The application of singular value decomposition is discussed in detail in Lawson and Hanson.
[6] There are many examples in the scientific literature where different methods have been used for non-linear data-fitting problems.
Direct search methods depend on evaluations of the objective function at a variety of parameter values and do not use derivatives at all.
More detailed descriptions of these, and other, methods are available, in Numerical Recipes, together with computer code in various languages.