Bootstrapping populations

Bootstrapping populations in statistics and mathematics starts with a sample

When X has a given distribution law with a set of non fixed parameters, we denote with a vector

, a parametric inference problem consists of computing suitable values – call them estimates – of these parameters precisely on the basis of the sample.

An estimate is suitable if replacing it with the unknown parameter does not cause major damage in next computations.

In Algorithmic inference, suitability of an estimate reads in terms of compatibility with the observed sample.

In this framework, resampling methods are aimed at generating a set of candidate values to replace the unknown parameters that we read as compatible replicas of them.

They represent a population of specifications of a random vector

By plugging parameters into the expression of the questioned distribution law, we bootstrap entire populations of random variables compatible with the observed sample.

The rationale of the algorithms computing the replicas, which we denote population bootstrap procedures, is to identify a set of statistics

exhibiting specific properties, denoting a well behavior, w.r.t.

The statistics are expressed as functions of the observed values

may be expressed as a function of the unknown parameters and a random seed specification

expressions as functions of seeds and parameters – the master equations – that we invert to find values of the latter as a function of: i) the statistics, whose values in turn are fixed at the observed ones; and ii) the seeds, which are random according to their own distribution.

of a random variable X and a sampling mechanism

Focusing on well-behaved statistics, for their parameters, the master equations read For each sample seed

Having computed a huge set of compatible vectors, say N, the empirical marginal distribution of

is the j-th component of the generic solution of (1) and where

Some indeterminacies remain if X is discrete and this we will be considered shortly.

The whole procedure may be summed up in the form of the following Algorithm, where the index

You may easily see from a table of sufficient statistics that we obtain the curve in the picture on the left by computing the empirical distribution (2) on the population obtained through the above algorithm when: i) X is an Exponential random variable, ii)

, and and the curve in the picture on the right when: i) X is a Uniform random variable in

, and Note that the accuracy with which a parameter distribution law of populations compatible with a sample is obtained is not a function of the sample size.

Instead, it is a function of the number of seeds we draw.

In turn, this number is purely a matter of computational time but does not require any extension of the observed data.

With other bootstrapping methods focusing on a generation of sample replicas (like those proposed by (Efron & Tibshirani 1993)) the accuracy of the estimate distributions depends on the sample size.

expected to represent a Pareto distribution, whose specification requires values for the parameters

and k,[2] we have that the cumulative distribution function reads: A sampling mechanism

uniform seed U and explaining function

is constituted by the pair of joint sufficient statistics for

Figure on the right reports the three-dimensional plot of the empirical cumulative distribution function (2) of