Dvoretzky–Kiefer–Wolfowitz inequality

In the theory of probability and statistics, the Dvoretzky–Kiefer–Wolfowitz–Massart inequality (DKW inequality) provides a bound on the worst case distance of an empirically determined distribution function from its associated population distribution function.

It is named after Aryeh Dvoretzky, Jack Kiefer, and Jacob Wolfowitz, who in 1956 proved the inequality with an unspecified multiplicative constant C in front of the exponent on the right-hand side.

[1] In 1990, Pascal Massart proved the inequality with the sharp constant C = 2,[2] confirming a conjecture due to Birnbaum and McCarty.

[3] Given a natural number n, let X1, X2, …, Xn be real-valued independent and identically distributed random variables with cumulative distribution function F(·).

Let Fn denote the associated empirical distribution function defined by so

The Dvoretzky–Kiefer–Wolfowitz inequality bounds the probability that the random function Fn differs from F by more than a given constant ε > 0 anywhere on the real line.

The Dvoretzky–Kiefer–Wolfowitz inequality is obtained for the Kaplan–Meier estimator which is a right-censored data analog of the empirical distribution function for every

The DKW bounds runs parallel to, and is equally above and below, the empirical CDF.

The equally spaced confidence interval around the empirical CDF allows for different rates of violations across the support of the distribution.

The above chart shows an example application of the DKW inequality in constructing confidence bounds (in purple) around an empirical distribution function (in light blue). In this random draw, the true CDF (orange) is entirely contained within the DKW bounds.