Omitted-variable bias

The direction and extent of the bias are both contained in cf, since the effect sought is b but the regression estimates b+cf.

As an example, consider a linear model of the form where We collect the observations of all variables subscripted i = 1, ..., n, and stack them one below another, to obtain the matrix X and the vectors Y, Z, and U: and If the independent variable z is omitted from the regression, then the estimated values of the response parameters of the other independent variables will be given by the usual least squares calculation, (where the "prime" notation means the transpose of a matrix and the -1 superscript is matrix inversion).

Substituting for Y based on the assumed linear model, On taking expectations, the contribution of the final term is zero; this follows from the assumption that U is uncorrelated with the regressors X.

On simplifying the remaining terms: The second term after the equal sign is the omitted-variable bias in this case, which is non-zero if the omitted variable z is correlated with any of the included variables in the matrix X (that is, if X′Z does not equal a vector of zeroes).

In ordinary least squares, the relevant assumption of the classical linear regression model is that the error term is uncorrelated with the regressors.