CDF-based nonparametric confidence interval

In statistics, cumulative distribution function (CDF)-based nonparametric confidence intervals are a general class of confidence intervals around statistical functionals of a distribution.

The latter requirement simply means that all the nonzero probability mass of the distribution must be contained in some known interval

Given an upper and lower bound on the CDF, the approach involves finding the CDFs within the bounds that maximize and minimize the statistical functional of interest.

Unlike approaches that make asymptotic assumptions, including bootstrap approaches and those that rely on the central limit theorem, CDF-based bounds are valid for finite sample sizes.

When producing bounds on the CDF, we must differentiate between pointwise and simultaneous bands.

A pointwise CDF bound is one which only guarantees their Coverage probability of

set equal to the number of samples in the empirical distribution.

CDF-based confidence intervals require a probabilistic bound on the CDF of the distribution from which the sample were generated.

A variety of methods exist for generating confidence intervals for the CDF of a distribution,

, the bound states This can be viewed as a confidence envelope that runs parallel to, and is equally above and below, the empirical CDF.

The equally spaced confidence interval around the empirical CDF allows for different rates of violations across the support of the distribution.

In contrast, the order statistics-based bound introduced by Learned-Miller and DeStefano[3] allows for an equal rate of violation across all of the order statistics.

Other types of bounds can be generated by varying the rate of violation for the order statistics.

For example, if a tighter bound on the distribution is desired on the upper portion of the support, a higher rate of violation can be allowed at the upper portion of the support at the expense of having a lower rate of violation, and thus a looser bound, for the lower portion of the support.

Assume without loss of generality that the support of the distribution is contained in

It can be shown[4] that the CDF that maximizes the mean is the one that runs along the lower confidence envelope,

, and the CDF that minimizes the mean is the one that runs along the upper envelope,

Using the identity the confidence interval for the mean can be computed as Assume without loss of generality that the support of the distribution of interest,

Further, it can be shown that this variance-minimizing CDF, F', must satisfy the constraint that the jump discontinuity occurs at

Explicit algorithms for calculating these variance-maximizing and minimizing CDFs are given by Romano and Wolf.

[5] The CDF-based framework for generating confidence intervals is very general and can be applied to a variety of other statistical functionals including

Illustration of different CDF bounds. This shows CDF bounds generated from a random sample of 30 points. The purple line is the simultaneous DKW bounds which encompass the entire CDF at 95% confidence level. The orange lines show the pointwise Clopper-Pearson bounds, which only guarantee individual points at the 95% confidence level and thus provide a tighter bound
Illustration of the bound on the empirical CDF that is obtained using the Dvoretzky–Kiefer–Wolfowitz inequality. The notation indicates the order statistic .