Wald test

In statistics, the Wald test (named after Abraham Wald) assesses constraints on statistical parameters based on the weighted distance between the unrestricted estimate and its hypothesized value under the null hypothesis, where the weight is the precision of the estimate.

[1][2] Intuitively, the larger this weighted distance, the less likely it is that the constraint is true.

While the finite sample distributions of Wald tests are generally unknown,[3]: 138  it has an asymptotic χ2-distribution under the null hypothesis, a fact that can be used to determine statistical significance.

An advantage of the Wald test over the other two is that it only requires the estimation of the unrestricted model, which lowers the computational burden as compared to the likelihood-ratio test.

However, a major disadvantage is that (in finite samples) it is not invariant to changes in the representation of the null hypothesis; in other words, algebraically equivalent expressions of non-linear parameter restriction can lead to different values of the test statistic.

[5][6] That is because the Wald statistic is derived from a Taylor expansion,[7] and different ways of writing equivalent nonlinear expressions lead to nontrivial differences in the corresponding Taylor coefficients.

[8] Another aberration, known as the Hauck–Donner effect,[9] can occur in binomial models when the estimated (unconstrained) parameter is close to the boundary of the parameter space—for instance a fitted probability being extremely close to zero or one—which results in the Wald test no longer monotonically increasing in the distance between the unconstrained and constrained parameter.

that was found as the maximizing argument of the unconstrained likelihood function is compared with a hypothesized value

If the hypothesis involves only a single parameter restriction, then the Wald statistic takes the following form: which under the null hypothesis follows an asymptotic χ2-distribution with one degree of freedom.

The square root of the single-restriction Wald statistic can be understood as a (pseudo) t-ratio that is, however, not actually t-distributed except for the special case of linear regression with normally distributed errors.

[12] In general, it follows an asymptotic z distribution.

is the standard error (SE) of the maximum likelihood estimate (MLE), the square root of the variance.

There are several ways to consistently estimate the variance matrix which in finite samples leads to alternative estimates of standard errors and associated test statistics and p-values.

[3]: 129  The validity of still getting an asymptotically normal distribution after plugin-in the MLE estimator of

into the SE relies on Slutsky's theorem.

vector), which is supposed to follow asymptotically a normal distribution with covariance matrix V,

The test of Q hypotheses on the P parameters is expressed with a

matrix R: The distribution of the test statistic under the null hypothesis is which in turn implies where

Then, by Slutsky's theorem and by the properties of the normal distribution, multiplying by R has distribution: Recalling that a quadratic form of normal distribution has a Chi-squared distribution: Rearranging n finally gives: What if the covariance matrix is not known a-priori and needs to be estimated from the data?

This result is obtained using the delta method, which uses a first order approximation of the variance.

The fact that one uses an approximation of the variance has the drawback that the Wald statistic is not-invariant to a non-linear transformation/reparametrisation of the hypothesis: it can give different answers to the same question, depending on how the question is phrased.

[17] Although they are asymptotically equivalent, in finite samples, they could disagree enough to lead to different conclusions.