If the null hypothesis that there are no differences between the classes in the population is true, the test statistic computed from the observations follows a χ2 frequency distribution.
The purpose of the test is to evaluate how likely the observed frequencies would be assuming the null hypothesis is true.
In the 19th century, statistical analytical methods were mainly applied in biological data analysis and it was customary for researchers to assume that observations followed a normal distribution, such as Sir George Airy and Mansfield Merriman, whose works were criticized by Karl Pearson in his 1900 paper.
[2] At the end of the 19th century, Pearson noticed the existence of significant skewness within some biological observations.
Pearson dealt first with the case in which the expected numbers mi are large enough known numbers in all cells assuming every observation xi may be taken as normally distributed, and reached the result that, in the limit as n becomes large, X2 follows the χ2 distribution with k − 1 degrees of freedom.
This conclusion caused some controversy in practical applications and was not settled for 20 years until Fisher's 1922 and 1924 papers.
When the row or column margins (or both) are random variables (as in most common research designs) this tends to be overly conservative and underpowered.
To reduce the error in approximation, Frank Yates suggested a correction for continuity that adjusts the formula for Pearson's chi-squared test by subtracting 0.5 from the absolute difference between each observed value and its expected value in a 2 × 2 contingency table.
For example, a manufacturing process might have been in stable condition for a long period, allowing a value for the variance to be determined essentially without error.
In cryptanalysis, the chi-squared test is used to compare the distribution of plaintext and (possibly) decrypted ciphertext.
[14] In bioinformatics, the chi-squared test is used to compare the distribution of certain properties of genes (e.g., genomic content, mutation rate, interaction network clustering, etc.)