Explained variation

In statistics, explained variation measures the proportion to which a mathematical model accounts for the variation (dispersion) of a given data set.

The complementary part of the total variation is called unexplained or residual variation; likewise, when discussing variance as such, this is referred to as unexplained or residual variance.

is the probability density of a random variable

Model family 0 is the simpler one, with a restricted parameter space

Parameters are determined by maximum likelihood estimation, The information gain of model 1 over model 0 is written as where a factor of 2 is included for convenience.

Then, can be interpreted as proportion of the data dispersion which is "explained" by X.

The fraction of variance unexplained is an established concept in the context of linear regression.

The usual definition of the coefficient of determination is based on the fundamental concept of explained variance.

In this case, the above-derived proportion of explained variation

Note the strong model assumptions: the centre of the Y distribution must be a linear function of X, and for any given x, the Y distribution must be normal.

In other situations, it is generally not justified to interpret

Explained variance is routinely used in principal component analysis.

The relation to the Fraser–Kent information gain remains to be clarified.

As the fraction of "explained variance" equals the squared correlation coefficient

, it shares all the disadvantages of the latter: it reflects not only the quality of the regression, but also the distribution of the independent (conditioning) variables.

gives the 'percentage of variance explained' by the regression, an expression that, for most social scientists, is of doubtful meaning but great rhetorical value.

If this number is large, the regression gives a good fit, and there is little point in searching for additional variables.

Other regression equations on different data sets are said to be less satisfactory or less powerful if their

is enhanced just by jointly considering data from two different populations: "'Explained variance' explains nothing.