Studentized residual

In statistics, a studentized residual is the dimensionless ratio resulting from the division of a residual by an estimate of its standard deviation, both expressed in the same units.

It is a form of a Student's t-statistic, with the estimate of error varying between points.

The key reason for studentizing is that, in regression analysis of a multivariate distribution, the variances of the residuals at different input variable values may differ, even if the variances of the errors at these different input variable values are equal.

Consider the simple linear regression model Given a random sample (Xi, Yi), i = 1, ..., n, each pair (Xi, Yi) satisfies where the errors

The residuals are not the true errors, but estimates, based on the observable data.

, cannot be independent since they satisfy the two constraints and (Here εi is the ith error, and

This is not a feature of the data itself, but of the regression better fitting values at the ends of the domain.

It is not simply a matter of the population parameters (mean and standard deviation) being unknown – it is that regressions yield different residual distributions at different data points, unlike point estimators of univariate distributions, which share a common distribution for residuals.

The variance of the ith residual is In case the design matrix X has only two columns (as in the example above), this is equal to In the case of an arithmetic mean, the design matrix X has only one column (a vector of ones), and this is simply: Given the definitions above, the Studentized residual is then where hii is the leverage, and

In the case of a mean, this is equal to: The usual estimate of σ2 is the internally studentized residual where m is the number of parameters in the model (2 in our example).

But if the i th case is suspected of being improbably large, then it would also not be normally distributed.

If the estimate σ2 includes the i th case, then it is called the internally studentized residual,

is used instead, excluding the i th case, then it is called the externally studentized,

If the errors are independent and normally distributed with expected value 0 and variance σ2, then the probability distribution of the ith externally studentized residual

is a Student's t-distribution with n − m − 1 degrees of freedom, and can range from

On the other hand, the internally studentized residuals are in the range

, where ν = n − m is the number of residual degrees of freedom.

If ti represents the internally studentized residual, and again assuming that the errors are independent identically distributed Gaussian variables, then:[2] where t is a random variable distributed as Student's t-distribution with ν − 1 degrees of freedom.

In fact, this implies that ti2 /ν follows the beta distribution B(1/2,(ν − 1)/2).

[3] When ν = 3, the internally studentized residuals are uniformly distributed between

The standard deviation of the distribution of internally studentized residuals is always 1, but this does not imply that the standard deviation of all the ti of a particular experiment is 1.

For instance, the internally studentized residuals when fitting a straight line going through (0, 0) to the points (1, 4), (2, −1), (2, −1) are

Note that any pair of studentized residual ti and tj (where

They have the same distribution, but are not independent due to constraints on the residuals having to sum to 0 and to have them be orthogonal to the design matrix.

Many programs and statistics packages, such as R, Python, etc., include implementations of Studentized residual.