Harmonic mean p-value

The harmonic mean p-value[1][2][3] (HMP) is a statistical technique for addressing the multiple comparisons problem that controls the strong-sense family-wise error rate[2] (this claim has been disputed[4]).

[5] However, it avoids the restrictive assumption that the p-values are independent, unlike Fisher's method.

[2][3] Consequently, it controls the false positive rate when tests are dependent, at the expense of less power (i.e. a higher false negative rate) when tests are independent.

[2] Besides providing an alternative to approaches such as Bonferroni correction that controls the stringent family-wise error rate, it also provides an alternative to the widely-used Benjamini-Hochberg procedure (BH) for controlling the less-stringent false discovery rate.

The approach provides a multilevel test procedure in which the smallest groups of p-values that are statistically significant may be sought.

In general, interpreting the HMP directly as a p-value is anti-conservative, meaning that the false positive rate is higher than expected.

However, as the HMP becomes smaller, under certain assumptions, the discrepancy decreases, so that direct interpretation of significance achieves a false positive rate close to that implied for sufficiently small values (e.g.

[3] However, these bounds represent worst case scenarios under arbitrary dependence that are likely to be conservative in practice.

Rather than applying these bounds, asymptotically exact p-values can be produced by transforming the HMP.

Generalized central limit theorem shows that an asymptotically exact p-value,

Subject to the assumptions of generalized central limit theorem, this transformed p-value becomes exact as the number of tests,

The test is implemented by the p.hmp command of the harmonicmeanp R package; a tutorial is available online.

The table illustrates that the smaller the false positive rate, and the smaller the number of tests, the closer the critical value is to the false positive rate.

p-values for the smallest significant group, while maintaining the strong-sense family-wise error rate.

), the following multilevel test based on direct interpretation of the HMP controls the strong-sense family-wise error rate at level approximately

[8] Since direct interpretation of the HMP is faster, a two-pass procedure may be used to identify subsets of p-values that are likely to be significant using direct interpretation, subject to confirmation using the asymptotically exact formula.

The HMP has a range of properties that arise from generalized central limit theorem.

Conversely, when the multilevel test deems a subset of p-values to be significant, the HMP for all the p-values combined is likely to be significant; this is certain when the HMP is interpreted directly.

When the goal is to assess the significance of individual p-values, so that combined tests concerning groups of p-values are of no interest, the HMP is equivalent to the Bonferroni procedure but subject to the more stringent significance threshold

The HMP assumes the individual p-values have (not necessarily independent) standard uniform distributions when their null hypotheses are true.

Large numbers of underpowered tests can therefore harm the power of the HMP.

Supplementary Methods §5C of [2] and an online tutorial consider the issue in more detail.

The HMP was conceived by analogy to Bayesian model averaging and can be interpreted as inversely proportional to a model-averaged Bayes factor when combining p-values from likelihood ratio tests.

Good reported an empirical relationship between the Bayes factor and the p-value from a likelihood ratio test.

Extrapolating, he proposed a rule of thumb in which the HMP is taken to be inversely proportional to the model-averaged Bayes factor for a collection of

For Good, his rule-of-thumb supported an interchangeability between Bayesian and classical approaches to hypothesis testing.

, a form considered by Sellke, Bayarri and Berger,[14] then the inverse proportionality between the model-averaged Bayes factor and the HMP can be formalized as[2][15]

For likelihood ratio tests with exactly two degrees of freedom, Wilks' theorem implies that

the HMP provides a tighter upper bound on the model-averaged Bayes factor:

a result that again reproduces the inverse proportionality of Good's empirical relationship.