Generalized linear model

The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

[1] They proposed an iteratively reweighted least squares method for maximum likelihood estimation (MLE) of the model parameters.

MLE remains popular and is the default method on many statistical computing packages.

Other approaches, including Bayesian regression and least squares fitting to variance stabilized responses, have been developed.

Ordinary linear regression predicts the expected value of a given unknown quantity (the response variable, a random variable) as a linear combination of a set of observed values (predictors).

This is appropriate when the response variable can vary, to a good approximation, indefinitely in either direction, or more generally for any quantity that only varies by a relatively small amount compared to the variation in the predictive variables, e.g. human heights.

As an example, suppose a linear prediction model learns from some data (perhaps primarily drawn from large beaches) that a 10 degree temperature decrease would lead to 1,000 fewer people visiting the beach.

Imagine, for example, a model that predicts the likelihood of a given person going to the beach as a function of temperature.

A reasonable model might predict, for example, that a change in 10 degrees makes a person two times more or less likely to go to the beach.

Generalized linear models cover all these situations by allowing for response variables that have arbitrary distributions (rather than simply normal distributions), and for an arbitrary function of the response variable (the link function) to vary linearly with the predictors (rather than assuming that the response itself must vary linearly).

For example, the case above of predicted number of beach attendees would typically be modeled with a Poisson distribution and a log link, while the case of predicted probability of beach attendance would typically be modelled with a Bernoulli distribution (or binomial distribution, depending on exactly how the problem is phrased) and a log-odds (or logit) link function.

The conditional mean μ of the distribution depends on the independent variables X through: where E(Y | X) is the expected value of Y conditional on X; Xβ is the linear predictor, a linear combination of unknown parameters β; g is the link function.

, this reduces to The linear predictor is the quantity which incorporates the information about the independent variables into the model.

The coefficients of the linear combination are represented as the matrix of independent variables X. η can thus be expressed as The link function provides the relationship between the linear predictor and the mean of the distribution function.

In the cases of the exponential and gamma distributions, the domain of the canonical link function is not the same as the permitted range of the mean.

The maximum likelihood estimates can be found using an iteratively reweighted least squares algorithm or a Newton's method with updates of the form: where

Results for the generalized linear model with non-identity link are asymptotic (tending to work well with large samples).

In linear regression, the use of the least-squares estimator is justified by the Gauss–Markov theorem, which does not assume that the distribution is normal.

From the perspective of generalized linear models, however, it is useful to suppose that the distribution function is the normal distribution with constant variance and the link function is the identity, which is the canonical link if the variance is known.

For the normal distribution, the generalized linear model has a closed form expression for the maximum-likelihood estimates, which is convenient.

When the response data, Y, are binary (taking on only values 0 and 1), the distribution function is generally chosen to be the Bernoulli distribution and the interpretation of μi is then the probability, p, of Yi taking on the value one.

If p represents the proportion of observations with at least one event, its complement and then A linear model requires the response variable to take values over the entire real line.

This produces the "cloglog" transformation The identity link g(p) = p is also sometimes used for binomial data to yield a linear probability model.

This can be avoided by using a transformation like cloglog, probit or logit (or any inverse cumulative distribution function).

The variance function for "quasibinomial" data is: where the dispersion parameter τ is exactly 1 for the binomial distribution.

The binomial case may be easily extended to allow for a multinomial distribution as the response (also, a Generalized Linear Model for counts, with a constrained total).

There are two ways in which this is usually done: If the response variable is ordinal, then one may fit a model function of the form: for m > 2.

The variance function is proportional to the mean where the dispersion parameter τ is typically fixed at exactly one.

Extensions have been developed to allow for correlation between observations, as occurs for example in longitudinal studies and clustered designs: Generalized additive models (GAMs) are another extension to GLMs in which the linear predictor η is not restricted to be linear in the covariates X but is the sum of smoothing functions applied to the xis: The smoothing functions fi are estimated from the data.

In general this requires a large number of data points and is computationally intensive.