Least mean squares (LMS) algorithms are a class of adaptive filter used to mimic a desired filter by finding the filter coefficients that relate to producing the least mean square of the error signal (difference between the desired and the actual signal).
It is a stochastic gradient descent method in that the filter is only adapted based on the error at the current time.
It was invented in 1960 by Stanford University professor Bernard Widrow and his first Ph.D. student, Ted Hoff, based on their research in single-layer neural networks (ADALINE).
Specifically, they used gradient descent to train ADALINE to recognize patterns, and called the algorithm "delta rule".
They then applied the rule to filters, resulting in the LMS algorithm.
is computed, and it is fed back to the adaptive filter, to adjust its parameters in order to minimize the mean square of the error signal
The realization of the causal Wiener filter looks a lot like the solution to the least squares estimate, except in the signal processing domain.
is The FIR least mean squares filter is related to the Wiener filter, but minimizing the error criterion of the former does not rely on cross-correlations or auto-correlations.
Most linear adaptive filtering problems can be formulated using the block diagram above.
The algorithm starts by assuming small weights (zero in most cases) and, at each step, by finding the gradient of the mean square error, the weights are updated.
That is, if the MSE-gradient is positive, it implies the error would keep increasing positively if the same weight is used for further iterations, which means we need to reduce the weights.
The mean-square error as a function of filter weights is a quadratic function which means it has only one extremum, that minimizes the mean-square error, which is the optimal weight.
Applying steepest descent means to take the partial derivatives with respect to the individual entries of the filter coefficient (weight) vector where
is a vector which points towards the steepest ascent of the cost function.
To find the minimum of the cost function we need to take a step in the opposite direction of
That means we have found a sequential update algorithm which minimizes the cost function.
Instead, to run the LMS in an online (updating after each new sample is received) environment, we use an instantaneous estimate of that expectation.
As the LMS algorithm does not use the exact values of the expectations, the weights would never reach the optimal weights in the absolute sense, but a convergence is possible in mean.
However, if the variance with which the weights change, is large, convergence in mean would be misleading.
is chosen to be large, the amount with which the weights change depends heavily on the gradient estimate, and so the weights may change by a large value so that gradient which was negative at the first instant may now become positive.
And at the second instant, the weight may change in the opposite direction by a large amount because of the negative gradient and would thus keep oscillating with a large variance about the optimal weights.
is chosen to be too small, time to converge to the optimal weights will be too large.
, that is, the maximum achievable convergence speed depends on the eigenvalue spread of
The common interpretation of this result is therefore that the LMS converges quickly for white input signals, and slowly for colored input signals, such as processes with low-pass or high-pass characteristics.
The main drawback of the "pure" LMS algorithm is that it is sensitive to the scaling of its input
This makes it very hard (if not impossible) to choose a learning rate
The Normalised least mean squares filter (NLMS) is a variant of the LMS algorithm that solves this problem by normalising with the power of the input.
), then the optimal learning rate for the NLMS algorithm is and is independent of the input
), the optimal learning rate is The results above assume that the signals
Assuming independence, we have: The optimal learning rate is found at