Probabilistic numerics

Probabilistic numerics is an active field of study at the intersection of applied mathematics, statistics, and machine learning centering on the concept of uncertainty in computation.

[6] Formally, this means casting the setup of the computational problem in terms of a prior distribution, formulating the relationship between numbers computed by the computer (e.g. matrix-vector multiplications in linear algebra, gradients in optimization, values of the integrand or the vector field defining a differential equation) and the quantity in question (the solution of the linear problem, the minimum, the integral, the solution curve) in a likelihood function, and returning a posterior distribution as the output.

In most cases, numerical algorithms also take internal adaptive decisions about which numbers to compute, which form an active learning problem.

[11] In all these cases, the classic method is based on a regularized least-squares estimate that can be associated with the posterior mean arising from a Gaussian prior and likelihood.

In such cases, the variance of the Gaussian posterior is then associated with a worst-case estimate for the squared error.

Probabilistic numerics have also been studied for mathematical optimization, which consist of finding the minimum or maximum of some objective function

throughout the optimization procedure; this often takes the form of a Gaussian process prior conditioned on observations.

This belief then guides the algorithm in obtaining observations that are likely to advance the optimization process.

One prominent approach is to model optimization via Bayesian sequential experimental design, seeking to obtain a sequence of observations yielding the most optimization progress as evaluated by an appropriate utility function.

A welcome side effect from this approach is that uncertainty in the objective function, as measured by the underlying probabilistic belief, can guide an optimization policy in addressing the classic exploration vs. exploitation tradeoff.

Probabilistic numerical methods have been developed in the context of stochastic optimization for deep learning, in particular to address main issues such as learning rate tuning and line searches,[21] batch-size selection,[22] early stopping,[23] pruning,[24] and first- and second-order search directions.

Hence, generally mini-batching is used to construct estimators of these quantities on a random subset of the data.

Probabilistic numerical methods model this uncertainty explicitly and allow for automated decisions and parameter tuning.

[30][31] A large class of methods are iterative in nature and collect information about the linear system to be solved via repeated matrix-vector multiplication

Such methods can be roughly split into a solution-[8][28] and a matrix-based perspective,[7][9] depending on whether belief is expressed over the solution

Methods typically assume a Gaussian distribution, due to its closedness under linear observations of the problem.

[27] Probabilistic numerical linear algebra routines have been successfully applied to scale Gaussian processes to large datasets.

[31][32] In particular, they enable exact propagation of the approximation error to a combined Gaussian process posterior, which quantifies the uncertainty arising from both the finite number of data observed and the finite amount of computation expended.

Many different probabilistic numerical methods designed for ordinary differential equations have been proposed, and these can broadly be grouped into the two following categories: The boundary between these two categories is not sharp, indeed a Gaussian process regression approach based on randomised data was developed as well.

A number of probabilistic numerical methods have also been proposed for partial differential equations.

[44] The interplay between numerical analysis and probability is touched upon by a number of other areas of mathematics, including average-case analysis of numerical methods, information-based complexity, game theory, and statistical decision theory.

[45] In modern terminology, Poincaré considered a Gaussian prior distribution on a function

, expressed as a formal power series with random coefficients, and asked for "probable values" of

A later seminal contribution to the interplay of numerical analysis and probability was provided by Albert Suldin in the context of univariate quadrature.

Suldin showed that, for given quadrature nodes, the quadrature rule with minimal mean squared error is the trapezoidal rule; furthermore, this minimal error is proportional to the sum of cubes of the inter-node spacings.

As a result, one can see the trapezoidal rule with equally-spaced nodes as statistically optimal in some sense — an early example of the average-case analysis of a numerical method.

As noted by Houman Owhadi and collaborators,[3][48] interplays between numerical approximation and statistical inference can also be traced back to Palasti and Renyi,[49] Sard,[50] Kimeldorf and Wahba[51] (on the correspondence between Bayesian estimation and spline smoothing/interpolation) and Larkin[47] (on the correspondence between Gaussian process regression and numerical approximation).

Although the approach of modelling a perfectly known function as a sample from a random process may seem counterintuitive, a natural framework for understanding it can be found in information-based complexity (IBC),[52] the branch of computational complexity founded on the observation that numerical implementation requires computation with partial information and limited resources.

Moreover, as Packel[53] observed, the average case setting could be interpreted as a mixed strategy in an adversarial game obtained by lifting a (worst-case) minmax problem to a minmax problem over mixed (randomized) strategies.

Interpreting this optimal recovery problem as a zero-sum game where Player I selects the unknown function and Player II selects its approximation, and using relative errors in a quadratic norm to define losses, Gaussian priors emerge[3] as optimal mixed strategies for such games, and the covariance operator of the optimal Gaussian prior is determined by the quadratic norm used to define the relative error of the recovery.