Bayesian quadrature

Bayesian quadrature[1][2][3][4][5] is a method for approximating intractable integration problems.

It falls within the class of probabilistic numerical methods.

Bayesian quadrature views numerical integration as a Bayesian inference task, where function evaluations are used to estimate the integral of that function.

A potential advantage of this approach is that it provides probabilistic uncertainty quantification for the value of the integral.

Bayesian quadrature consists of specifying a prior distribution over

sometimes takes the form of a quadrature rule whose weights are determined by the choice of prior.

is a Gaussian process as this permits conjugate inference to obtain a closed-form posterior distribution on

In particular, note that the posterior mean is a quadrature rule with weights

and the posterior variance provides a quantification of the user's uncertainty over the value of

In more challenging integration problems, where the prior distribution cannot be relied upon as a meaningful representation of epistemic uncertainty, it is necessary to use the data

to set the kernel hyperparameters using, for example, maximum likelihood estimation.

The estimation of kernel hyperparameters introduces adaptivity into Bayesian quadrature.

using a Bayesian quadrature rule based on a zero-mean Gaussian process prior with the Matérn covariance function of smoothness

Convergence of the Bayesian quadrature point estimate

is evaluated at more and more points is displayed in the accompanying animation.

Since Bayesian quadrature is an example of probabilistic numerics, it inherits certain advantages compared with traditional numerical integration methods: Despite these merits, Bayesian quadrature methods possess the following limitations: The most commonly used prior for

This is mainly due to the advantage provided by Gaussian conjugacy and the fact that Gaussian processes can encode a wide range of prior knowledge including smoothness, periodicity and sparsity through a careful choice of prior covariance.

This includes multi-output Gaussian processes,[9] which are particularly useful when tackling multiple related numerical integration tasks simultaneously or sequentially, and tree-based priors such as Bayesian additive regression trees,[10] which are well suited for discontinuous

Additionally, Dirichlet processes priors have also been proposed for the integration measure

One approach consists of using point sets from other quadrature rules.

For example, taking independent and identically distributed realisations from

recovers a Bayesian approach to Monte Carlo,[3] whereas using certain deterministic point sets such as low-discrepancy sequences or lattices recovers a Bayesian alternative to quasi-Monte Carlo.

[4][12] It is of course also possible to use point sets specifically designed for Bayesian quadrature; see for example the work of [13] who exploited symmetries in point sets to obtain scalable Bayesian quadrature estimators.

Alternatively, points can also be selected adaptively following principles from active learning and Bayesian experimental design so as to directly minimise posterior uncertainty,[14][15] including for multi-output Gaussian processes.

[16] One of the challenges when implementing Bayesian quadrature is the need to evaluate the function

The former is commonly called the kernel mean, and is a quantity which is key to the computation of kernel-based distances such as the maximum mean discrepancy.

Unfortunately, the kernel mean and initial error can only be computed for a small number of

[4] There have been a number of theoretical guarantees derived for Bayesian quadrature.

These usually require Sobolev smoothness properties of the integrand,[4][17][18] although recent work also extends to integrands in the reproducing kernel Hilbert space of the Gaussian kernel.

[19] Most of the results apply to the case of Monte Carlo or deterministic grid point sets, but some results also extend to adaptive designs.