Positive-definite kernel

It was first introduced by James Mercer in the early 20th century, in the context of solving integral operator equations.

Since then, positive-definite functions and their various analogues and generalizations have arisen in diverse parts of mathematics.

They occur naturally in Fourier analysis, probability theory, operator theory, complex function-theory, moment problems, integral equations, boundary-value problems for partial differential equations, machine learning, embedding problem, information theory, and other areas.

In probability theory, a distinction is sometimes made between positive-definite kernels, for which equality in (1.1) implies

Note that this is equivalent to requiring that every finite matrix constructed by pairwise evaluation,

[1] In the rest of this article we assume real-valued functions, which is the common practice in applications of p.d.

Positive-definite kernels, as defined in (1.1), appeared first in 1909 in a paper on integral equations by James Mercer.

[3] Several other authors made use of this concept in the following two decades, but none of them explicitly used kernels

Hilbert defined a “definite” kernel as one for which the double integral

The original object of Mercer’s paper was to characterize the kernels which are definite in the sense of Hilbert, but Mercer soon found that the class of such functions was too restrictive to characterize in terms of determinants.

, and he proved that (1.1) is a necessary and sufficient condition for a kernel to be of positive type.

At about the same time W. H. Young,[5] motivated by a different question in the theory of integral equations, showed that for continuous kernels condition (1.1) is equivalent to

Moore was interested in generalization of integral equations and showed that to each such

kernels played a large role was the theory of harmonics on homogeneous spaces as begun by E. Cartan in 1929, and continued by H. Weyl and S. Ito.

kernels in homogeneous spaces is that of M. Krein[8] which includes as special cases the work on p.d.

functions and irreducible unitary representations of locally compact groups.

[9] Positive-definite kernels provide a framework that encompasses some basic Hilbert space constructions.

In the following we present a tight relationship between positive-definite kernels and two mathematical objects, namely reproducing Hilbert spaces and feature maps.

is called a reproducing kernel Hilbert space if the evaluation functionals are continuous.

As stated earlier, positive definite kernels can be constructed from inner products.

kernels with another interesting object that arises in machine learning applications, namely the feature map.

In this section we discuss parallels between their two respective ingredients, namely kernels

kernel induces a pseudometric, where the first constraint on the distance function is loosened to allow

Positive-definite kernels, through their equivalence with reproducing kernel Hilbert spaces (RKHS), are particularly important in the field of statistical learning theory because of the celebrated representer theorem which states that every minimizer function in an RKHS can be written as a linear combination of the kernel function evaluated at the training points.

This is a practically useful result as it effectively simplifies the empirical risk minimization problem from an infinite dimensional to a finite dimensional optimization problem.

One of the greatest application areas of so-called meshfree methods is in the numerical solution of PDEs.

Some of the popular meshfree methods are closely related to positive-definite kernels (such as meshless local Petrov Galerkin (MLPG), Reproducing kernel particle method (RKPM) and smoothed-particle hydrodynamics (SPH)).

Other types of applications that boil down to data fitting are rapid prototyping and computer graphics.

Here one often uses implicit surface models to approximate or interpolate point cloud data.

kernels in various other branches of mathematics are in multivariate integration, multivariate optimization, and in numerical analysis and scientific computing, where one studies fast, accurate and adaptive algorithms ideally implemented in high-performance computing environments.