Marchenko–Pastur distribution

In the mathematical theory of random matrices, the Marchenko–Pastur distribution, or Marchenko–Pastur law, describes the asymptotic behavior of singular values of large rectangular random matrices.

The theorem is named after Soviet Ukrainian mathematicians Volodymyr Marchenko and Leonid Pastur who proved this result in 1967.

denotes a

random matrix whose entries are independent identically distributed random variables with mean 0 and variance

σ

λ

λ

λ

be the eigenvalues of

(viewed as random variables).

Finally, consider the random measure counting the number of eigenvalues in the subset

included in

Theorem.

[citation needed] Assume that

so that the ratio

λ ∈ ( 0 , + ∞ )

μ

(in weak* topology in distribution), where and with The Marchenko–Pastur law also arises as the free Poisson law in free probability theory, having rate

and jump size

-th moment is[1] The Stieltjes transform is given by for complex numbers z of positive imaginary part, where the complex square root is also taken to have positive imaginary part.

[2] It satisfies the quadratic equation

The Stieltjes transform can be repackaged in the form of the R-transform, which is given by[3] The S-transform is given by[3] For the case of

satisfies the Marchenko-Pastur law.

For exact analyis of high dimensional regression in the proportional asymptotic regime, a convenient form is often

which simplifies to The following functions

satisfies the Marchenko-Pastur law, show up in the limiting Bias and Variance respectively, of ridge regression and other regularized linear regression problems.

For the special case of correlation matrices, we know that

This bounds the probability mass over the interval defined by Since this distribution describes the spectrum of random matrices with mean 0, the eigenvalues of correlation matrices that fall inside of the aforementioned interval could be considered spurious or noise.

For instance, obtaining a correlation matrix of 10 stock returns calculated over a 252 trading days period would render

Thus, out of 10 eigenvalues of said correlation matrix, only the values higher than 1.43 would be considered significantly different from random.

Plot of the Marchenko-Pastur distribution for various values of lambda