In numerical analysis and computational statistics, rejection sampling is a basic technique used to generate observations from a distribution.
To visualize the motivation behind rejection sampling, imagine graphing the probability density function (PDF) of a random variable onto a large rectangular board and throwing darts at it.
The remaining darts will be distributed uniformly within the area under the curve, and the
This is because there is the most room for the darts to land where the curve is highest and thus the probability density is greatest.
The visualization just described is equivalent to a particular form of rejection sampling where the "proposal distribution" is uniform.
Its shape must be at least as high at every point as the distribution we want to sample from, so that the former completely encloses the latter.
Thus, the algorithm can be used to sample from a distribution whose normalizing constant is unknown, which is common in computational statistics.
The validation of this method is the envelope principle: when simulating the pair
This means that, with enough replicates, the algorithm generates a sample from the desired distribution
This method relates to the general field of Monte Carlo techniques, including Markov chain Monte Carlo algorithms that also use a proxy distribution to achieve simulation from the target distribution
to obtain an accepted value thus follows a geometric distribution with probability
is the expected number of the iterations that are needed, as a measure of the computational complexity of the algorithm.
is chosen closer to one, the unconditional acceptance probability is higher the less that ratio varies, since
The algorithm, which was used by John von Neumann[4] and dates back to Buffon and his needle,[5] obtains a sample from distribution
[6] Rejection sampling can be far more efficient compared with the naive methods in some situations.
Moreover, even when you apply the Rejection sampling method, it is always hard to optimize the bound
is large and the rejection rate is high, the algorithm can be very inefficient.
and speed up the computations (see examples: working with Natural Exponential Families).
Assume for simplicity, the density function can be explicitly written as
For the above example, as the measurement of the efficiency, the expected number of the iterations the natural exponential family based rejection sampling method is of order
In general, exponential tilting a parametric class of proposal distribution, solves the optimization problems conveniently, with its useful properties that directly characterize the distribution of the proposal.
, among the class of simple distributions, the trick is to use natural exponential family, which helps to gain some control over the complexity and considerably speed up the computation.
Indeed, there are deep mathematical reasons for using natural exponential family.
In addition, as the dimensions of the problem get larger, the ratio of the embedded volume to the "corners" of the embedding volume tends towards zero, thus a lot of rejections can take place before a useful sample is generated, thus making the algorithm inefficient and impractical.
An extension of rejection sampling that can be used to overcome this difficulty and efficiently sample from a wide variety of distributions (provided that they have log-concave density functions, which is in fact the case for most of the common distributions—even those whose density functions are not concave themselves) is known as adaptive rejection sampling (ARS).
There are three basic ideas to this technique as ultimately introduced by Gilks in 1992:[7] The method essentially involves successively determining an envelope of straight-line segments that approximates the logarithm better and better while still remaining above the curve, starting with a fixed number of segments (possibly just a single tangent line).
Just take the log of a uniform random variable (with appropriate interval and corresponding truncation).
For this reason, several extensions of ARS have been proposed in literature for tackling non-log-concave target distributions.
[9][10][11] Furthermore, different combinations of ARS and the Metropolis-Hastings method have been designed in order to obtain a universal sampler that builds a self-tuning proposal densities (i.e., a proposal automatically constructed and adapted to the target).
This class of methods are often called as Adaptive Rejection Metropolis Sampling (ARMS) algorithms.