Tolerance interval

A tolerance interval (TI) is a statistical interval within which, with some confidence level, a specified sampled proportion of a population falls.

"More specifically, a 100×p%/100×(1−α) tolerance interval provides limits within which at least a certain proportion (p) of the population falls with a given level of confidence (1−α).

"[1] "A (p, 1−α) tolerance interval (TI) based on a sample is constructed so that it would include at least a proportion p of the sampled population with confidence 1−α; such a TI is usually referred to as p-content − (1−α) coverage TI.

"[2] "A (p, 1−α) upper tolerance limit (TL) is simply a 1−α upper confidence limit for the 100 p percentile of the population.

is not involved in the definition of tolerance interval, which deals only with the first sample, of size n. One-sided normal tolerance intervals have an exact solution in terms of the sample mean and sample variance based on the noncentral t-distribution.

[4] Two-sided normal tolerance intervals can be estimated using the chi-squared distribution.

will not necessarily include 95% of the population, due to variance in these estimates.

A tolerance interval bounds this variance by introducing a confidence level

, which is the confidence with which this interval actually includes the specified proportion of the population.

[7] "As the degrees of freedom approach infinity, the prediction and tolerance intervals become equal.

[9][10] The tolerance interval differs from a confidence interval in that the confidence interval bounds a single-valued population parameter (the mean or the variance, for example) with some confidence, while the tolerance interval bounds the range of data values that includes a specific proportion of the population.

Whereas a confidence interval's size is entirely due to sampling error, and will approach a zero-width interval at the true population parameter as sample size increases, a tolerance interval's size is due partly to sampling error and partly to actual variance in the population, and will approach the population's probability interval as sample size increases.

However, the prediction interval only bounds a single future sample, whereas a tolerance interval bounds the entire population (equivalently, an arbitrary sequence of future samples).

[10][11] [9] gives the following example: So consider once again a proverbial EPA mileage test scenario, in which several nominally identical autos of a particular model are tested to produce mileage figures

If such data are processed to produce a 95% confidence interval for the mean mileage of the model, it is, for example, possible to use it to project the mean or total gasoline consumption for the manufactured fleet of such autos over their first 5,000 miles of use.

Such an interval, would however, not be of much help to a person renting one of these cars and wondering whether the (full) 10-gallon tank of gas will suffice to carry him the 350 miles to his destination.

nor a prediction interval for a single additional mileage is exactly what is needed by a design engineer charged with determining how large a gas tank the model really needs to guarantee that 99% of the autos produced will have a 400-mile cruising range.

What the engineer really needs is a tolerance interval for a fraction

, respectively, denote the population mean and variance for the log-transformed data.

can be constructed the usual way, based on the t-distribution; this in turn will provide a confidence interval for the median air lead level.

denote the sample mean and standard deviation of the log-transformed data for a sample of size n, a 95% confidence interval for

It may also be of interest to derive a 95% upper confidence bound for the median air lead level.

Consequently, a 95% upper confidence bound for the median air lead is given by

Now suppose we want to predict the air lead level at a particular area within the laboratory.

A 95% upper prediction limit for the log-transformed lead level is given by

A two-sided prediction interval can be similarly computed.

In other words, the interval is meant to provide information concerning the parameter

A prediction interval has a similar interpretation, and is meant to provide information concerning a single lead level only.

Now suppose we want to use the sample to conclude whether or not at least 95% of the population lead levels are below a threshold.

The upper tolerance limit is to be computed subject to the condition that at least 95% of the population lead levels is below the limit, with a certain confidence level, say 99%.