In the theory of stochastic processes, the Karhunen–Loève theorem (named after Kari Karhunen and Michel Loève), also known as the Kosambi–Karhunen–Loève theorem[1][2] states that a stochastic process can be represented as an infinite linear combination of orthogonal functions, analogous to a Fourier series representation of a function on a bounded interval.
The importance of the Karhunen–Loève theorem is that it yields the best such basis in the sense that it minimizes the total mean squared error.
In contrast to a Fourier series where the coefficients are fixed numbers and the expansion basis consists of sinusoidal functions (that is, sine and cosine functions), the coefficients in the Karhunen–Loève theorem are random variables and the expansion basis depends on the process.
In the case of a centered stochastic process {Xt}t ∈ [a, b] (centered means E[Xt] = 0 for all t ∈ [a, b]) satisfying a technical continuity condition, X admits a decomposition where Zk are pairwise uncorrelated random variables and the functions ek are continuous real-valued functions on [a, b] that are pairwise orthogonal in L2([a, b]).
The empirical version (i.e., with the coefficients computed from a sample) is known as the Karhunen–Loève transform (KLT), principal component analysis, proper orthogonal decomposition (POD), empirical orthogonal functions (a term used in meteorology and geophysics), or the Hotelling transform.
Let Xt be a zero-mean square-integrable stochastic process defined over a probability space (Ω, F, P) and indexed over a closed and bounded interval [a, b], with continuous covariance function KX(s, t).
Then KX(s,t) is a Mercer kernel and letting ek be an orthonormal basis on L2([a, b]) formed by the eigenfunctions of TKX with respective eigenvalues λk, Xt admits the following representation where the convergence is in L2, uniform in t and Furthermore, the random variables Zk have zero-mean, are uncorrelated and have variance λk Note that by generalizations of Mercer's theorem we can replace the interval [a, b] with other compact spaces C and the Lebesgue measure on [a, b] with a Borel measure whose support is C. Since the limit in the mean of jointly Gaussian random variables is jointly Gaussian, and jointly Gaussian random (centered) variables are independent if and only if they are orthogonal, we can also conclude: Theorem.
We hence introduce βk, the Lagrangian multipliers associated with these constraints, and aim at minimizing the following function: Differentiating with respect to fi(t) (this is a functional derivative) and setting the derivative to 0 yields: which is satisfied in particular when In other words, when the fk are chosen to be the eigenfunctions of TKX, hence resulting in the KL expansion.
The approximation can be made more precise by choosing the M orthogonal vectors depending on the signal properties.
for any q > p. To further illustrate the differences between linear and non-linear approximations, we study the decomposition of a simple non-Gaussian random vector in a Karhunen–Loève basis.
In contrast, since f has only two non-zero coefficients in the Dirac basis, a non-linear approximation of Y with M ≥ 2 gives zero error.
We also noted that one hurdle in its application was the numerical cost of determining the eigenvalues and eigenfunctions of its covariance operator through the Fredholm integral equation of the second kind However, when applied to a discrete and finite process
Note that a continuous process can also be sampled at N points in time in order to reduce the problem to a finite version.
Recall that the main implication and difficulty of the KL transformation is computing the eigenvectors of the linear operator associated to the covariance function, which are given by the solutions to the integral equation written above.
The integral equation thus reduces to a simple matrix eigenvalue problem, which explains why the PCA has such a broad domain of applications.
Since Σ is a positive definite symmetric matrix, it possesses a set of orthonormal eigenvectors forming a basis of
this set of eigenvalues and corresponding eigenvectors, listed in decreasing values of λi.
Recall that the transform was found by expanding the process with respect to the basis spanned by the eigenvectors of the covariance function.
[6] There are numerous equivalent characterizations of the Wiener process which is a mathematical formalization of Brownian motion.
Here we regard it as the centered standard Gaussian process Wt with covariance function We restrict the time domain to [a, b]=[0,1] without loss of generality.
Setting t = 0 in the initial integral equation gives e(0) = 0 which implies that B = 0 and similarly, setting t = 1 in the first differentiation yields e' (1) = 0, whence: which in turn implies that eigenvalues of TKX are: The corresponding eigenfunctions are thus of the form: A is then chosen so as to normalize ek: This gives the following representation of the Wiener process: Theorem.
There is a sequence {Zi}i of independent Gaussian random variables with mean zero and variance 1 such that Note that this representation is only valid for
When the channel noise is white, its correlation function is and it has constant power spectrum density.
, the problem becomes, The log-likelihood ratio As t → 0, let: Then G is the test statistics and the Neyman–Pearson optimum detector is As G is Gaussian, we can characterize it by finding its mean and variances.
The false alarm error And the probability of detection: where Φ is the cdf of standard normal, or Gaussian, variable.
When N(t) is colored (correlated in time) Gaussian noise with zero mean and covariance function
The equation can be solved by taking fourier transform, but not practically realizable since infinite spectrum needs spatial factorization.
Let C = 1, this is just the result we arrived at in previous section for detecting of signal in white noise.
For example, N(t) is a wide-sense stationary colored noise with correlation function The transfer function of prewhitening filter is When the signal we want to detect from the noisy channel is also random, for example, a white Gaussian process X(t), we can still implement K–L expansion to get independent sequence of observation.
, then the problem is simplified as follows, The Neyman–Pearson optimal test: so the log-likelihood ratio is Since is just the minimum-mean-square estimate of