Family-wise error rate

In statistics, family-wise error rate (FWER) is the probability of making one or more false discoveries, or type I errors when performing multiple hypotheses tests.

John Tukey developed in 1953 the concept of a familywise error rate as the probability of making a Type I error among a specified group, or "family," of tests.

As Ryan (1959, Footnote 3) explained, an experiment may contain two or more families of multiple comparisons, each of which relates to a particular statistical inference and each of which has its own separate familywise error rate.

[2] Hence, familywise error rates are usually based on theoretically informative collections of multiple comparisons.

In contrast, an experimentwise error rate may be based on a collection of simultaneous comparisons that refer to a diverse range of separate inferences.

Some have argued that it may not be useful to control the experimentwise error rate in such cases.

[4] Within the statistical framework, there are several definitions for the term "family": To summarize, a family could best be defined by the potential selective inference that is being faced: A family is the smallest set of items of inference in an analysis, interchangeable about their meaning for the goal of research, from which selection of results for action, presentation or highlighting could be made (Yoav Benjamini).

[citation needed] The following table defines the possible outcomes when testing multiple null hypotheses.

Suppose we have a number m of null hypotheses, denoted by: H1, H2, ..., Hm.

Summing each type of outcome over all Hi yields the following random variables: In m hypothesis tests of which

The FWER is the probability of making at least one type I error in the family, or equivalently, Thus, by assuring

, the probability of making one or more type I errors in the family is controlled at level

FWER control, and some newer solutions exist.

[6] The reason why this procedure controls the family-wise error rate for all the m hypotheses at level α in the strong sense is, because it is a closed testing procedure.

[11][12] However, it has been suggested that a modified version of the Hochberg procedure remains valid under general negative dependence.

Now known as Dunnett's test, this method is less conservative than the Bonferroni adjustment.

[citation needed] The procedures of Bonferroni and Holm control the FWER under any dependence structure of the p-values (or equivalently the individual test statistics).

Essentially, this is achieved by accommodating a `worst-case' dependence structure (which is close to independence for most practical purposes).

To give an extreme example, under perfect positive dependence, there is effectively only one test and thus, the FWER is uninflated.

Accounting for the dependence structure of the p-values (or of the individual test statistics) produces more powerful procedures.

The procedure of Westfall and Young (1993) requires a certain condition that does not always hold in practice (namely, subset pivotality).

[14] The procedures of Romano and Wolf (2005a,b) dispense with this condition and are thus more generally valid.

[15][16] The harmonic mean p-value (HMP) procedure[17][18] provides a multilevel test that improves on the power of Bonferroni correction by assessing the significance of groups of hypotheses while controlling the strong-sense family-wise error rate.

FWER control exerts a more stringent control over false discovery compared to false discovery rate (FDR) procedures.

FWER control limits the probability of at least one false discovery, whereas FDR control limits (in a loose sense) the expected proportion of false discoveries.

Thus, FDR procedures have greater power at the cost of increased rates of type I errors, i.e., rejecting null hypotheses that are actually true.

[20] On the other hand, FWER control is less stringent than per-family error rate control, which limits the expected number of errors per family.