Control functions (also known as two-stage residual inclusion) are statistical methods to correct for endogeneity problems by modelling the endogeneity in the error term.
The approach thereby differs in important ways from other models that try to account for the same econometric problem.
Instrumental variables, for example, attempt to model the endogenous variable X as an often invertible model with respect to a relevant and exogenous instrument Z.
Panel analysis uses special data properties to difference out unobserved heterogeneity that is assumed to be fixed over time.
Control functions were introduced by Heckman and Robb[1] although the principle can be traced back to earlier papers.
[3] A well-known example of the control function approach is the Heckman correction.
[4] In a Rubin causal model potential outcomes framework, where Y1 is the outcome variable of people for who the participation indicator D equals 1, the control function approach leads to the following model as long as the potential outcomes Y0 and Y1 are independent of D conditional on X and Z.
[5] Since the second-stage regression includes generated regressors, its variance-covariance matrix needs to be adjusted.
[6][7] Wooldridge and Terza provide a methodology to both deal with and test for endogeneity within the exponential regression framework, which the following discussion follows closely.
serve as instrumental variables for the potentially endogenous
onto the instruments to get the following reduced form equation: The usual rank condition is needed to ensure identification.
Imposing these assumptions, assuming the models are correctly specified, and normalizing
Following the two step procedure strategies, Wooldridge and Terza propose estimating equation (1) by ordinary least squares.
The fitted residuals from this regression can then be plugged into the estimating equation (2) and QMLE methods will lead to consistent estimators of the parameters of interest.
The original Heckit procedure makes distributional assumptions about the error terms, however, more flexible estimation approaches with weaker distributional assumptions have been established.
[9] Furthermore, Blundell and Powell show how the control function approach can be particularly helpful in models with nonadditive errors, such as discrete choice models.
[10] This latter approach, however, does implicitly make strong distributional and functional form assumptions.