Degrees of freedom (statistics)

The number of independent pieces of information that go into the estimate of a parameter is called the degrees of freedom.

Although the basic concept of degrees of freedom was recognized as early as 1821 in the work of German astronomer and mathematician Carl Friedrich Gauss,[3] its modern definition and usage was first elaborated by English statistician William Sealy Gosset in his 1908 Biometrika article "The Probable Error of a Mean", published under the pen name "Student".

[4] While Gosset did not actually use the term 'degrees of freedom', he explained the concept in the course of developing what became known as Student's t-distribution.

The term itself was popularized by English statistician and biologist Ronald Fisher, beginning with his 1922 work on chi squares.

[5] In equations, the typical symbol for degrees of freedom is ν (lowercase Greek letter nu).

R. A. Fisher used n to symbolize degrees of freedom but modern usage typically reserves n for sample size.

The second residual vector is the least-squares projection onto the (n − 1)-dimensional orthogonal complement of this subspace, and has n − 1 degrees of freedom.

In statistical testing applications, often one is not directly interested in the component vectors, but rather in their squared lengths.

Likewise, the one-sample t-test statistic, follows a Student's t distribution with n − 1 degrees of freedom when the hypothesized mean

An example which is only slightly less simple is that of least squares estimation of a and b in the model where xi is given, but ei and hence Yi are random.

Then the residuals are constrained to lie within the space defined by the two equations One says that there are n − 2 degrees of freedom for error.

Notationally, the capital letter Y is used in specifying the model, while lower-case y in the definition of the residuals; that is because the former are hypothesized random variables and the latter are actual data.

We can generalise this to multiple regression involving p parameters and covariates (e.g. p − 1 predictors and one mean (=intercept in the regression)), in which case the cost in degrees of freedom of the fit is p, leaving n - p degrees of freedom for errors The demonstration of the t and chi-squared distributions for one-sample problems above is the simplest example where degrees-of-freedom arise.

An explicit example based on comparison of three means is presented here; the geometry of linear models is discussed in more complete detail by Christensen (2002).

The restriction to three groups and equal sample sizes simplifies notation, but the ideas are easily generalized.

Under the null hypothesis of no difference between population means (and assuming that standard ANOVA regularity assumptions are satisfied) the sums of squares have scaled chi-squared distributions, with the corresponding degrees of freedom.

In some complicated settings, such as unbalanced split-plot designs, the sums-of-squares no longer have scaled chi-squared distributions.

Comparison of sum-of-squares with degrees-of-freedom is no longer meaningful, and software may report certain fractional 'degrees of freedom' in these cases.

Such numbers have no genuine degrees-of-freedom interpretation, but are simply providing an approximate chi-squared distribution for the corresponding sum-of-squares.

This terminology simply reflects that in many applications where these distributions occur, the parameter corresponds to the degrees of freedom of an underlying random vector, as in the preceding ANOVA example.

In the application of these distributions to linear models, the degrees of freedom parameters can take only integer values.

The underlying families of distributions allow fractional values for the degrees-of-freedom parameters, which can arise in more sophisticated uses.

One set of examples is problems where chi-squared approximations based on effective degrees of freedom are used.

However, because H does not correspond to an ordinary least-squares fit (i.e. is not an orthogonal projection), these sums-of-squares no longer have (scaled, non-central) chi-squared distributions, and dimensionally defined degrees-of-freedom are not useful.

The effective degrees of freedom of the fit can be defined in various ways to implement goodness-of-fit tests, cross-validation, and other statistical inference procedures.

[10] In the case of linear regression, the hat matrix H is X(X 'X)−1X ', and all these definitions reduce to the usual degrees of freedom.

One way to help to conceptualize this is to consider a simple smoothing matrix like a Gaussian blur, used to mitigate data noise.

In contrast to a simple linear or polynomial fit, computing the effective degrees of freedom of the smoothing function is not straightforward.

Naive application of classical formula, n − p, would lead to over-estimation of the residuals degree of freedom, as if each observation were independent.

The more general formulation of effective degree of freedom would result in a more realistic estimate for, e.g., the error variance σ2, which in its turn scales the unknown parameters' a posteriori standard deviation; the degree of freedom will also affect the expansion factor necessary to produce an error ellipse for a given confidence level.