Importance sampling

Examples include Bayesian networks and importance weighted variational autoencoders.

[4] Importance sampling is a variance reduction technique that can be used in the Monte Carlo method.

The idea behind importance sampling is that certain values of the input random variables in a simulation have more impact on the parameter being estimated than others.

If these "important" values are emphasized by sampling more frequently, then the estimator variance can be reduced.

However, the simulation outputs are weighted to correct for the use of the biased distribution, and this ensures that the new importance sampling estimator is unbiased.

Choosing or designing a good biased distribution is the "art" of importance sampling.

The rewards for a good distribution can be huge run-time savings; the penalty for a bad distribution can be longer run times than for a general Monte Carlo simulation without importance sampling.

Importance sampling is concerned with the determination and use of an alternate density function

, use of the biasing density results in a variance smaller than that of the conventional Monte Carlo estimate.

with a number greater than unity has the effect of increasing the variance (mean also) of the density function.

This results in a heavier tail of the density, leading to an increase in the event probability.

Scaling is probably one of the earliest biasing methods known and has been extensively used in practice.

It is simple to implement and usually provides conservative simulation gains as compared to other methods.

A modern version of importance sampling by scaling is e.g. so-called sigma-scaled sampling (SSS) which is running multiple Monte Carlo (MC) analysis with different scaling factors.

In opposite to many other high yield estimation methods (like worst-case distances WCD) SSS does not suffer much from the dimensionality problem.

On the other hand, as WCD, SSS is only designed for Gaussian statistical variables, and in opposite to WCD, the SSS method is not designed to provide accurate statistical corners.

Another SSS disadvantage is that the MC runs with large scale factors may become difficult, e. g. due to model and simulator convergence problems.

In addition, in SSS we face a strong bias-variance trade-off: Using large scale factors, we obtain quite stable yield results, but the larger the scale factors, the larger the bias error.

If the advantages of SSS does not matter much in the application of interest, then often other methods are more efficient.

Another simple and effective biasing technique employs translation of the density function (and hence random variable) to place much of its probability mass in the rare event region.

Translation does not suffer from a dimensionality effect and has been successfully used in several applications relating to simulation of digital communication systems.

is the amount of shift and is to be chosen to minimize the variance of the importance sampling estimator.

The fundamental problem with importance sampling is that designing good biased distributions becomes more complicated as the system complexity increases.

This dimensionality or memory can cause problems in three ways: In principle, the importance sampling ideas remain the same in these situations, but the design becomes much harder.

A successful approach to combat this problem is essentially breaking down a simulation into several smaller, more sharply defined subproblems.

This has to be computed empirically since the estimator variances are not likely to be analytically possible when their mean is intractable.

Other useful concepts in quantifying an importance sampling estimator are the variance bounds and the notion of asymptotic efficiency.

One related measure is the so-called Effective Sample Size (ESS).

Perhaps a more serious overhead to importance sampling is the time taken to devise and program the technique and analytically derive the desired weight function.

Hence, since a population of proposal densities is used, several suitable combinations of sampling and weighting schemes can be employed.