Variance function

The variance function is a measure of heteroscedasticity and plays a large role in many settings of statistical modelling.

When it is likely that the response follows a distribution that is a member of the exponential family, a generalized linear model may be more appropriate to use, and moreover, when we wish not to force a parametric model onto our data, a non-parametric regression approach can be useful.

The importance of being able to model the variance as a function of the mean lies in improved inference (in a parametric setting), and estimation of the regression function in general, for any setting.

Variance functions play a very important role in parameter estimation and inference.

This requirement then implies that one must first specify the distribution of the response variables observed.

A very important use of this function is in the framework of generalized linear models and non-parametric regression.

When a member of the exponential family has been specified, the variance function can easily be derived.

A GLM consists of three main ingredients: First it is important to derive a couple key properties of the exponential family.

in the exponential family has a probability density function of the form, with loglikelihood, Here,

We use the Bartlett's Identities to derive a general expression for the variance function.

The first and second Bartlett results ensures that under suitable conditions (see Leibniz integral rule), for a density function dependent on

, These identities lead to simple calculations of the expected value and variance of any random variable

of the log of the density in the exponential family form described above, we have Then taking the expected value and setting it equal to zero leads to, Variance of Y: To compute the variance we use the second Bartlett identity, We have now a relationship between

The normal distribution is a special case where the variance function is a constant.

, then we express the density of the Bernoulli distribution in exponential family form, This give us Let

, then we express the density of the Poisson distribution in exponential family form, This give us Here we see the central property of Poisson data, that the variance is equal to the mean.

A very important application of the variance function is its use in parameter estimation and inference when the response variable is of the required exponential family form as well as in some cases when it is not (which we will discuss in quasi-likelihood).

Each term in the WLS criterion includes a weight that determines that the influence each observation has on the final parameter estimates.

As in regular least squares, the goal is to estimate the unknown parameters in the regression function by finding values for parameter estimates that minimize the sum of the squared deviations between the observed responses and the functional portion of the model.

The Gauss–Markov theorem and Aitken demonstrate that the best linear unbiased estimator (BLUE), the unbiased estimator with minimum variance, has each weight equal to the reciprocal of the variance of the measurement.

are defined in the previous section, it allows for iteratively reweighted least squares (IRLS) estimation of the parameters.

See the section on iteratively reweighted least squares for more derivation and information.

Also, important to note is that when the weight matrix is of the form described here, minimizing the expression

, requires Looking at a single observation we have, This gives us The Hessian matrix is determined in a similar manner and can be shown to be, Noticing that the Fisher Information (FI), Because most features of GLMs only depend on the first two moments of the distribution, rather than the entire distribution, the quasi-likelihood can be developed by just specifying a link function and a variance function.

Ultimately the goal is to find information about the parameters of interest

The QL, QS and QI all provide the building blocks for inference about the parameters of interest and therefore it is important to express the QL, QS and QI all as functions of

, we derive the expressions for QL, QS and QI parametrized under

allows for parameter estimation and inference in a similar manner as described in Application – weighted least squares.

Non-parametric estimation of the variance function and its importance, has been discussed widely in the literature[5][6][7] In non-parametric regression analysis, the goal is to express the expected value of your response variable(y) as a function of your predictors (X).

The goal of the project was to determine (among other things) whether or not the predictor, number of years in the major leagues (baseball), had an effect on the response, salary, a player made.

A scattor plot of years in the major league against salary (x$1000). The line is the trend in the mean. The plot demonstrates that the variance is not constant.
The smoothed conditional variance against the smoothed conditional mean. The quadratic shape is indicative of the Gamma Distribution. The variance function of a Gamma is V( ) =