Empirical Bayes methods are procedures for statistical inference in which the prior probability distribution is estimated from the data.
This approach stands in contrast to standard Bayesian methods, for which the prior distribution is fixed before any data are observed.
Despite this difference in perspective, empirical Bayes may be viewed as an approximation to a fully Bayesian treatment of a hierarchical model wherein the parameters at the highest level of the hierarchy are set to their most likely values, instead of being integrated out.
[1] Empirical Bayes, also known as maximum marginal likelihood,[2] represents a convenient approach for setting hyperparameters, but has been mostly supplanted by fully Bayesian hierarchical analyses since the 2000s with the increasing availability of well-performing computation techniques.
[citation needed] It is still commonly used, however, for variational methods in Deep Learning, such as variational autoencoders, where latent variable spaces are high-dimensional.
In, for example, a two-stage hierarchical Bayes model, observed data
are assumed to be generated from an unobserved set of parameters
can be considered samples drawn from a population characterised by hyperparameters
Using Bayes' theorem, In general, this integral will not be tractable analytically or symbolically and must be evaluated by numerical methods.
Alternatively, the expression can be written as and the final factor in the integral can in turn be expressed as These suggest an iterative scheme, qualitatively similar in structure to a Gibbs sampler, to evolve successively improved approximations to
representing the distribution's peak (or, alternatively, its mean), With this approximation, the above iterative scheme becomes the EM algorithm.
The term "Empirical Bayes" can cover a wide variety of methods, but most can be regarded as an early truncation of either the above scheme or something quite like it.
Robbins[3] considered a case of sampling from a mixed distribution, where probability for each
Compound sampling arises in a variety of statistical estimation problems, such as accident rates and clinical trials.
[citation needed] We simply seek a point prediction of
Because the prior is unspecified, we seek to do this without knowledge of G.[4] Under squared error loss (SEL), the conditional expectation E(θi | Yi = yi) is a reasonable quantity to use for prediction.
For the Poisson compound sampling model, this quantity is This can be simplified by multiplying both the numerator and denominator by
, yielding where pG is the marginal probability mass function obtained by integrating out θ over G. To take advantage of this, Robbins[3] suggested estimating the marginals with their empirical frequencies (
Suppose each customer of an insurance company has an "accident rate" Θ and is insured against accidents; the probability distribution of Θ is the underlying distribution, and is unknown.
The number of accidents suffered by each customer in a specified time period has a Poisson distribution with expected value equal to the particular customer's accident rate.
The actual number of accidents experienced by a customer is the observable quantity.
A crude way to estimate the underlying probability distribution of the accident rate Θ is to estimate the proportion of members of the whole population suffering 0, 1, 2, 3, ... accidents during the specified time period as the corresponding proportion in the observed random sample.
Having done so, it is then desired to predict the accident rate of each customer in the sample.
This shrinkage effect is typical of empirical Bayes analyses.
by direct calculation with the probability density function of multivariate gaussians.
[5] If the likelihood and its prior take on simple parametric forms (such as 1- or 2-dimensional likelihood functions with simple conjugate priors), then the empirical Bayes problem is only to estimate the marginal
For example, one common approach, called parametric empirical Bayes point estimation, is to approximate the marginal using the maximum likelihood estimate (MLE), or a moments expansion, which allows one to express the hyperparameters
This simplified marginal allows one to plug in the empirical averages into a point estimate for the prior
Write where the marginal distribution has been omitted since it does not depend explicitly on
To apply empirical Bayes, we will approximate the marginal using the maximum likelihood estimate (MLE).