Resampling (statistics)

Bootstrapping techniques are also used in the updating-selection transitions of particle filters, genetic type algorithms and related resample/reconfiguration Monte Carlo methods used in computational physics.

One form of cross-validation leaves out a single observation at a time; this is similar to the jackknife.

Another, K-fold cross-validation, splits the data into K subsets; each is held out in turn as the validation set.

Without cross-validation, adding predictors always reduces the residual sum of squares (or possibly leaves it unchanged).

In particular, a set of sufficient conditions is that the rate of convergence of the estimator is known and that the limiting distribution is continuous.

There are many cases of applied interest where subsampling leads to valid inference whereas bootstrapping does not; for example, such cases include examples where the rate of convergence of the estimator is not the square root of the sample size or when the limiting distribution is non-normal.

[5][6] This method was foreshadowed by Mahalanobis who in 1946 suggested repeated estimates of the statistic of interest with half the sample chosen at random.

Quenouille invented this method with the intention of reducing the bias of the sample estimate.

Tukey extended this method by assuming that if the replicates could be considered identically and independently distributed, then an estimate of the variance of the sample parameter could be made and that it would be approximately distributed as a t variate with n−1 degrees of freedom (n being the sample size).

Jackknife is equivalent to the random (subsampling) leave-one-out cross-validation, it only differs in the goal.

[8] For many statistical parameters the jackknife estimate of variance tends asymptotically to the true value almost surely.

Because of this, the jackknife is popular when the estimates need to be verified several times before publishing (e.g., official statistics agencies).

On the other hand, when this verification feature is not crucial and it is of interest not to have a number but just an idea of its distribution, the bootstrap is preferred (e.g., studies in physics, economics, biological sciences).

Whether to use the bootstrap or the jackknife may depend more on operational aspects than on statistical concerns of a survey.

Theoretical aspects of both the bootstrap and the jackknife can be found in Shao and Tu (1995),[10] whereas a basic introduction is accounted in Wolter (2007).