In statistics, the Neyman–Pearson lemma describes the existence and uniqueness of the likelihood ratio as a uniformly most powerful test in certain contexts.
It was introduced by Jerzy Neyman and Egon Pearson in a paper in 1933.
[1] The Neyman–Pearson lemma is part of the Neyman–Pearson theory of statistical testing, which introduced concepts like errors of the second kind, power function, and inductive behavior.
[2][3][4] The previous Fisherian theory of significance testing postulated only one hypothesis.
By introducing a competing hypothesis, the Neyman–Pearsonian flavor of statistical testing allows investigating the two types of errors.
The trivial cases where one always rejects or accepts the null hypothesis are of little interest but it does prove that one must not relinquish control over one type of error while calibrating the other.
Neyman and Pearson accordingly proceeded to restrict their attention to the class of all
level tests while subsequently minimizing type II error, traditionally denoted by
Their seminal paper of 1933, including the Neyman–Pearson lemma, comes at the end of this endeavor, not only showing the existence of tests with the most power that retain a prespecified level of type I error (
The Karlin-Rubin theorem extends the Neyman–Pearson lemma to settings involving composite hypotheses with monotone likelihood ratios.
Neyman–Pearson lemma[5] — Existence: If a hypothesis test satisfies
condition, then it is a uniformly most powerful (UMP) test in the set of level
However it can also be used to suggest particular test-statistics that might be of interest or to suggest simplified tests — for this, one considers algebraic manipulation of the ratio to see if there are key statistics in it related to the size of the ratio (i.e. whether a large statistic corresponds to a small ratio or to a large one).
is nonnegative, and integrates to zero, it must be exactly zero except on some ignorable set
The likelihood for this set of normally distributed data is We can compute the likelihood ratio to find the key statistic in this test and its effect on the test's outcome: This ratio only depends on the data through
Therefore, by the Neyman–Pearson lemma, the most powerful test of this type of hypothesis for this data will depend only on
The rejection threshold depends on the size of the test.
In this example, the test statistic can be shown to be a scaled chi-square distributed random variable and an exact critical value can be obtained.
A variant of the Neyman–Pearson lemma has found an application in the seemingly unrelated domain of the economics of land value.
In radar systems, the Neyman–Pearson lemma is used in first setting the rate of missed detections to a desired (low) level, and then minimizing the rate of false alarms, or vice versa.
Neither false alarms nor missed detections can be set at arbitrarily low rates, including zero.
The Neyman–Pearson lemma is applied to the construction of analysis-specific likelihood-ratios, used to e.g. test for signatures of new physics against the nominal Standard Model prediction in proton–proton collision datasets collected at the LHC.
I can point to the particular moment when I understood how to formulate the undogmatic problem of the most powerful test of a simple statistical hypothesis against a fixed simple alternative.
At the present time [probably 1968], the problem appears entirely trivial and within easy reach of a beginning undergraduate.
But, with a degree of embarrassment, I must confess that it took something like half a decade of combined effort of E. S. P. [Egon Pearson] and myself to put things straight.
The solution of the particular question mentioned came on an evening when I was sitting alone in my room at the Statistical Laboratory of the School of Agriculture in Warsaw, thinking hard on something that should have been obvious long before.
This was my wife, with some friends, telling me that it was time to go to a movie.
And then, as I got up from my desk to answer the call, I suddenly understood: for any given critical region and for any given alternative hypothesis, it is possible to calculate the probability of the error of the second kind; it is represented by this particular integral.
Once this is done, the optimal critical region would be the one which minimizes this same integral, subject to the side condition concerned with the probability of the error of the first kind.
These thoughts came in a flash, before I reached the window to signal to my wife.