Counting Bloom filter

A counting Bloom filter is a probabilistic data structure that is used to test whether the number of occurrences of a given element in a sequence exceeds a given threshold.

As a generalized form of the Bloom filter, false positive matches are possible, but false negatives are not – in other words, a query returns either "possibly bigger or equal than the threshold" or "definitely smaller than the threshold".

An empty counting Bloom filter is a m counters, all set to 0.

Similar to Bloom filter, there must also be k different hash functions defined, each of which maps or hashes some set element to one of the m counter array positions, generating a uniform random distribution.

It is also similar that k is a constant, much smaller than m, which is proportional to the number of elements to be added.

The main generalization of Bloom filter is adding an element.

To query for an element with a threshold θ (test whether the count number of an element is smaller than θ), feed it to each of the k hash functions to get k counter positions.

A counting Bloom filter is essentially the same data structure as count–min sketches, but are used differently.

Several implementations of counting bloom filters allow for deletion, by decrementing each of the k counters for a given input.

This will introduce the probability of false negatives during a query if the deleted input has not previously been inserted into the filter.

Guo et al. present the problem in great detail, and provide heuristics for the parameters m, k, and n which minimize the probability of false negatives.

[1] The same assumptions in Bloom filter, which hash functions make insertions uniform random, is also assumed here.

[2] A counting Bloom filter determines an element is greater or equal to θ when the corresponding k counters are greater or equal to θ.

Therefore, the probability that counting Bloom filter determines an element is greater or equal to θ is

This is different from formal definition of false positive in counting Bloom filter.

For large but fixed n and m, the false positive decreases from k=1 to a point defined