Variable kernel density estimation

In a balloon estimator, the kernel width is varied depending on the location of the test point.

In a pointwise estimator, the kernel width is varied depending on the location of the sample.

[1] For multivariate estimators, the parameter, h, can be generalized to vary not just the size, but also the shape of the kernel.

A common method of varying the kernel width is to make it inversely proportional to the density at the test point: where k is a constant.

If we back-substitute the estimated PDF, and assuming a Gaussian kernel function, we can show that W is a constant:[2] A similar derivation holds for any kernel whose normalising function is of the order hD, although with a different constant factor in place of the (2 π)D/2 term.

By using a Taylor expansion for the real function, the bias term drops out: An optimal kernel width that minimizes the error of each estimate can thus be derived.

There are two ways we can proceed: the first is to compute the PDFs of each class separately, using different bandwidth parameters, and then compare them as in Taylor.