Informally, it is the similarity between observations of a random variable as a function of the time lag between them.
The analysis of autocorrelation is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies.
is the expected value operator and the bar represents complex conjugation.
Subtracting the mean before multiplication yields the auto-covariance function between times
: the autocovariance depends only on the time-distance between the pair of values but not on their position in time.
It is common practice in some disciplines (e.g. statistics and time series analysis) to normalize the autocovariance function to get a time-dependent Pearson correlation coefficient.
However, in other disciplines (e.g. engineering) the normalization is usually dropped and the terms "autocorrelation" and "autocovariance" are used interchangeably.
The normalization is important both because the interpretation of the autocorrelation as a correlation provides a scale-free measure of the strength of statistical dependence, and because the normalization has an effect on the statistical properties of the estimated autocorrelations.
The autocorrelation of a continuous-time white noise signal will have a strong peak (represented by a Dirac delta function) at
containing random elements whose expected value and variance exist, the autocorrelation matrix is defined by[3]: p.190 [1]: p.334
In signal processing, the above definition is often used without the normalization, that is, without subtracting the mean and dividing by the variance.
Signals that "last forever" are treated instead as random processes, in which case different definitions are needed, based on expected values.
For processes that are also ergodic, the expectation can be replaced by the limit of a time average.
Alternatively, signals that last forever can be treated by a short-time autocorrelation function analysis, using finite time integrals.
A brute force method based on the signal processing definition
In this calculation we do not perform the carry-over operation during addition as is usual in normal multiplication.
Note that we can halve the number of operations required by exploiting the inherent symmetry of the autocorrelation.
The procedure can be regarded as an application of the convolution property of Z-transform of a discrete signal.
For example, the Wiener–Khinchin theorem allows computing the autocorrelation from the raw data X(t) with two fast Fourier transforms (FFT):[6][page needed]
Alternatively, a multiple τ correlation can be performed by using brute force calculation for low τ values, and then progressively binning the X(t) data with a logarithmic density to compute higher values, resulting in the same n log(n) efficiency, but with lower memory requirements.
Other estimates can suffer from the problem that, if they are used to calculate the variance of a linear combination of the
With multiple interrelated data series, vector autoregression (VAR) or its extensions are used.
In ordinary least squares (OLS), the adequacy of a model specification can be checked in part by establishing whether there is autocorrelation of the regression residuals.
Autocorrelation of the errors violates the ordinary least squares assumption that the error terms are uncorrelated, meaning that the Gauss Markov theorem does not apply, and that OLS estimators are no longer the Best Linear Unbiased Estimators (BLUE).
While it does not bias the OLS coefficient estimates, the standard errors tend to be underestimated (and the t-scores overestimated) when the autocorrelations of the errors at low lags are positive.
The Durbin-Watson can be linearly mapped however to the Pearson correlation between values and their lags.
This involves an auxiliary regression, wherein the residuals obtained from estimating the model of interest are regressed on (a) the original regressors and (b) k lags of the residuals, where 'k' is the order of the test.
The simplest version of the test statistic from this auxiliary regression is TR2, where T is the sample size and R2 is the coefficient of determination.
[13] In the estimation of a moving average model (MA), the autocorrelation function is used to determine the appropriate number of lagged error terms to be included.
Autocorrelation's ability to find repeating patterns in data yields many applications, including: Serial dependence is closely linked to the notion of autocorrelation, but represents a distinct concept (see Correlation and dependence).