Tukey's range test

It can be used to correctly interpret the statistical significance of the difference between means that have been selected for comparison because of their extreme values.

However, the studentized range distribution used to determine the level of significance of the differences considered in Tukey's test has vastly broader application: It is useful for researchers who have searched their collected data for remarkable differences between groups, but then cannot validly determine how significant their discovered stand-out difference is using standard statistical distributions used for other conventional statistical tests, for which the data must have been selected at random.

Since when stand-out data is compared it was by definition not selected at random, but rather specifically chosen because it was extreme, it needs a different, stricter interpretation provided by the likely frequency and size of the studentized range; the modern practice of "data mining" is an example where it is used.

In other words, the Tukey method is conservative when there are unequal sample sizes.

Tukey's test is based on a formula very similar to that of the t-test.

In fact, Tukey's test is essentially a t-test, except that it corrects for family-wise error rate.

This qs test statistic can then be compared to a q value for the chosen significance level α from a table of the studentized range distribution.

[3] Since the null hypothesis for Tukey's test states that all means being compared are from the same population (i.e. μ1 = μ2 = μ3 = ... = μk ), the means should be normally distributed (according to the central limit theorem) with the same model standard deviation σ, estimated by the merged standard error,

Then the following random variable has a Studentized range distribution: This definition of the statistic q given above is the basis of the critically significant value for qα discussed below, and is based on these three factors: ( df = N − k ) where N is the total number of observations.)

The Tukey confidence limits for all pairwise comparisons with confidence coefficient of at least   1 − α   are Notice that the point estimator and the estimated variance are the same as those for a single pairwise comparison.

Also note that the sample sizes must be equal when using the studentized range approach.

is the standard deviation of the entire design, not just that of the two groups being compared.

In this case, one has to calculate the estimated standard deviation for each pairwise comparison as formalized by Clyde Kramer in 1956, so the procedure for unequal sample sizes is sometimes referred to as the Tukey–Kramer method which is as follows: where n i and n j are the sizes of groups i and j respectively.

However, these two tests for k groups (i.e. μ1 = μ2 = ... = μk ) may result in logical contradictions when k > 2 , even if the assumptions do hold.

It is possible to generate a set of pseudorandom samples of strictly negative measure such that hypothesis μ1 = μ2 is rejected at significance level