Representer theorem

For computer science, in statistical learning theory, a representer theorem is any of several related results stating that a minimizer

of a regularized empirical risk functional defined over a reproducing kernel Hilbert space can be represented as a finite linear combination of kernel products evaluated on the input points in the training set data.

The following Representer Theorem and its proof are due to Schölkopf, Herbrich, and Smola:[1] Theorem: Consider a positive-definite real-valued kernel

with a corresponding reproducing kernel Hilbert space

Let there be given which together define the following regularized empirical risk functional on

: Then, any minimizer of the empirical risk admits a representation of the form: where

The above orthogonal decomposition and the reproducing property together show that applying

The Theorem stated above is a particular example of a family of results that are collectively referred to as "representer theorems"; here we describe several such.

The first statement of a representer theorem was due to Kimeldorf and Wahba for the special case in which for

Schölkopf, Herbrich, and Smola generalized this result by relaxing the assumption of the squared-loss cost and allowing the regularizer to be any strictly monotonically increasing function

It is possible to generalize further by augmenting the regularized empirical risk functional through the addition of unpenalized offset terms.

The conditions under which a representer theorem exists were investigated by Argyriou, Micchelli, and Pontil, who proved the following: Theorem: Let

with corresponding reproducing kernel Hilbert space

, a minimizer of the regularized empirical risk admits a representation of the form where

for which Effectively, this result provides a necessary and sufficient condition on a differentiable regularizer

under which the corresponding regularized empirical risk minimization

In particular, this shows that a broad class of regularized risk minimizations (much broader than those originally considered by Kimeldorf and Wahba) have representer theorems.

Representer theorems are useful from a practical standpoint because they dramatically simplify the regularized empirical risk minimization problem

In most interesting applications, the search domain

, and therefore the search (as written) does not admit implementation on finite-memory and finite-precision computers.

afforded by a representer theorem reduces the original (infinite-dimensional) minimization problem to a search for the optimal

can then be obtained by applying any standard function minimization algorithm.

Consequently, representer theorems provide the theoretical basis for the reduction of the general machine learning problem to algorithms that can actually be implemented on computers in practice.

The following provides an example of how to solve for the minimizer whose existence is guaranteed by the representer theorem.

This method works for any positive definite kernel

, and allows us to transform a complicated (possibly infinite dimensional) optimization problem into a simple linear system that can be solved numerically.

By the representer theorem, the minimizer has the form for some

is positive definite, there is indeed a single global minimum for this expression.

, the global minimum, can be solved by setting

Recalling that all positive definite matrices are invertible, we see that so the minimizer may be found via a linear solve.