Stepwise regression

The procedure is used primarily in regression analysis, though the basic approach is applicable in many forms of model selection.

Extreme cases have been noted where models have achieved statistical significance working on random numbers.

The key line in the sand is at what can be thought of as the Bonferroni point: namely how significant the best spurious variable should be based on chance alone.

[15] This method is particularly valuable when data are collected in different settings (e.g., different times, social vs. solitary situations) or when models are assumed to be generalizable.

Critics regard the procedure as a paradigmatic example of data dredging, intense computation often being an inadequate substitute for subject area expertise.

Additionally, the results of stepwise regression are often used incorrectly without adjusting them for the occurrence of model selection.

[7] Widespread incorrect usage and the availability of alternatives such as ensemble learning, leaving all variables in the model, or using expert judgement to identify relevant variables have led to calls to totally avoid stepwise model selection.

In this example from engineering, necessity and sufficiency are usually determined by F-tests . For additional consideration, when planning an experiment , computer simulation , or scientific survey to collect data for this model , one must keep in mind the number of parameters , P , to estimate and adjust the sample size accordingly. For K variables , P = 1 (Start) + K (Stage I) + ( K 2 K )/2 (Stage II) + 3 K (Stage III) = 0.5 K 2 + 3.5 K + 1. For K < 17, an efficient design of experiments exists for this type of model, a Box–Behnken design , [ 9 ] augmented with positive and negative axial points of length min(2, (int(1.5 + K /4)) 1/2 ), plus point(s) at the origin. There are more efficient designs, requiring fewer runs, even for K > 16.