Balls into bins problem

The balls into bins (or balanced allocations) problem is a classic problem in probability theory that has many applications in computer science.

The problem involves m balls and n boxes (or "bins").

Each time, a single ball is placed into one of the bins.

The problem can be modelled using a Multinomial distribution, and may involve asking a question such as: What is the expected number of bins with a ball in them?

A powerful balls-into-bins paradigm is the "power of two random choices[2]" where each ball chooses two (or more) random bins and is placed in the lesser-loaded bin.

This paradigm has found wide practical applications in shared-memory emulations, efficient hashing schemes, randomized load balancing of tasks on servers, and routing of packets within parallel networks and data centers.

[2] When the bin for each ball is selected at random, independent of other choices, the maximum load might be as large as

However, it is possible to calculate a tighter bound that holds with high probability.

Gonnet [5] gave a tight bound for the expected value of the maximum load, which for

This paradigm often called the "power of two random choices" has been studied in a number of settings below.

random bins at each step and then allocates the ball into the least loaded of the selected bins (ties broken arbitrarily), then with high probability the maximum load is:[8] which is almost exponentially less than with totally random allocation.

), when with high probability the maximum load is:[9] which is tight up to an additive constant.

, the random allocation process gives only the maximum load of

with high probability, so the improvement between these two processes is especially visible for large values of

Other key variants of the paradigm are "parallel balls-into-bins" where

random bins in parallel,[10] "weighted balls-into-bins" where balls have non-unit weights,[11][12][13] and "balls-into-bins with deletions" where balls can be added as well as deleted.

For m=n, after a sufficiently long time, with high probability the maximum load is similar to the finite version, both with random allocation and with partially random allocation.

bins in an arbitrary way and then, in every subsequent step of a discrete-time process, one ball is chosen from each non-empty bin and re-assigned to one of the

, it has been shown that with high probability the process converges to a configuration with maximum load

[15] Online Load Balancing:[16] consider a set of n identical computers.

Each user would of course like to select the least loaded computer, but this requires to check the load on each computer, which might take a long time.

Another option is to select a computer at random; this leads, with high probability, to a maximum load of

A possible compromise is that the user will check only two computers, and use the lesser loaded of the two.

This leads, with high probability, to a much smaller maximum load of

The efficiency of accessing a key depends on the length of its list.

If we use a single hash function which selects locations with uniform probability, with high probability the longest chain has

A possible improvement is to use two hash functions, and put each new key in the shorter of the two lists.

In this case, with high probability the longest chain has only

[17] Fair cake-cutting: consider the problem of creating a partially proportional division of a heterogeneous resource among

The Edmonds–Pruhs protocol is a randomized algorithm whose analysis make use of balls-into-bins arguments.