Cointegration

Cointegration is a statistical property of a collection (X1, X2, ..., Xk) of time series variables.

First, all of the series must be integrated of order d. Next, if a linear combination of this collection is integrated of order less than d, then the collection is said to be co-integrated.

Formally, if (X,Y,Z) are each integrated of order d, and there exist coefficients a,b,c such that aX + bY + cZ is integrated of order less than d, then X, Y, and Z are cointegrated.

Cointegration has become an important property in contemporary time series analysis.

In an influential paper,[1] Charles Nelson and Charles Plosser (1982) provided statistical evidence that many US macroeconomic time series (like GNP, wages, employment, etc.)

A common example is where the individual series are first-order integrated (⁠

⁠) but some (cointegrating) vector of coefficients exists to form a stationary linear combination of them.

The first to introduce and analyse the concept of spurious—or nonsense—regression was Udny Yule in 1926.

[2] Before the 1980s, many economists used linear regressions on non-stationary time series data, which Nobel laureate Clive Granger and Paul Newbold showed to be a dangerous approach that could produce spurious correlation,[3] since standard detrending techniques can result in data that are still non-stationary.

[4] Granger's 1987 paper with Robert Engle formalized the cointegrating vector approach, and coined the term.

⁠ processes, Granger and Newbold showed that de-trending does not work to eliminate the problem of spurious correlation, and that the superior alternative is to check for co-integration.

Thus the standard current methodology for time series regressions is to check all-time series involved for integration.

The possible presence of cointegration must be taken into account when choosing a technique to test hypotheses concerning the relationship between two variables having unit roots (i.e. integrated of at least order one).

[3] The usual procedure for testing hypotheses concerning the relationship between non-stationary variables was to run ordinary least squares (OLS) regressions on data which had been differenced.

This method is biased if the non-stationary variables are cointegrated.

For example, regressing the consumption series for any country (e.g. Fiji) against the GNP for a randomly selected dissimilar country (e.g. Afghanistan) might give a high R-squared relationship (suggesting high explanatory power on Fiji's consumption from Afghanistan's GNP).

⁠ series which are not directly causally related may nonetheless show a significant correlation.

both have order of integration d=1 and are cointegrated, then a linear combination of them must be stationary for some value of

is estimated, the critical values of this ADF test are non-standard, and increase in absolute value as more regressors are included.

[6] If the variables are found to be cointegrated, a second-stage regression is conducted.

If the sample size is too small then the results will not be reliable and one should use Auto Regressive Distributed Lags (ARDL).

[7][8] Peter C. B. Phillips and Sam Ouliaris (1990) show that residual-based unit root tests applied to the estimated cointegrating residuals do not have the usual Dickey–Fuller distributions under the null hypothesis of no-cointegration.

[9] Because of the spurious regression phenomenon under the null hypothesis, the distribution of these tests have asymptotic distributions that depend on (1) the number of deterministic trend terms and (2) the number of variables with which co-integration is being tested.

In finite samples, a superior alternative to the use of these asymptotic critical value is to generate critical values from simulations.

⁠ series, but it is more generally applicable and can be used for variables integrated of higher order (to detect correlated accelerations or other second-difference effects).

In reality, it is possible that the long-run relationship between the underlying variables change (shifts in the cointegrating vector can occur).

The reason for this might be technological progress, economic crises, changes in the people's preferences and behaviour accordingly, policy or regime alteration, and organizational or institutional developments.

To take this issue into account, tests have been introduced for cointegration with one unknown structural break,[10] and tests for cointegration with two unknown breaks are also available.

[11] Several Bayesian methods have been proposed to compute the posterior distribution of the number of cointegrating relationships and the cointegrating linear combinations.