Binomial regression

In statistics, binomial regression is a regression analysis technique in which the response (often referred to as Y) has a binomial distribution: it is the number of successes in a series of ⁠

[1] In binomial regression, the probability of a success is related to explanatory variables: the corresponding concept in ordinary regression is to relate the mean value of the unobserved response to explanatory variables.

The observed outcome variable was whether or not a fault occurred in an industrial process.

There were two explanatory variables: the first was a simple two-case factor representing whether or not a modified version of the process was used and the second was an ordinary quantitative variable measuring the purity of the material being supplied for the process.

This implies that the conditional expectation and conditional variance of the observed fraction of successes, Y/n, are The goal of binomial regression is to estimate the function θ(X).

[1] The data are often fitted as a generalised linear model where the predicted values μ are the probabilities that any individual event will result in a success.

The likelihood of the predictions is then given by where 1A is the indicator function which takes on the value one when the event A occurs, and zero otherwise: in this formulation, for any given observation yi, only one of the two terms inside the product contributes, according to whether yi=0 or 1.

The likelihood function is more fully specified by defining the formal parameters μi as parameterised functions of the explanatory variables: this defines the likelihood in terms of a much reduced number of parameters.

Fitting of the model is usually achieved by employing the method of maximum likelihood to determine these parameters.

In practice, the use of a formulation as a generalised linear model allows advantage to be taken of certain algorithmic ideas which are applicable across the whole class of more general models but which do not apply to all maximum likelihood problems.

Models used in binomial regression can often be extended to multinomial data.

There are many methods of generating the values of μ in systematic ways that allow for interpretation of the model; they are discussed below.

There is a requirement that the modelling linking the probabilities μ to the explanatory variables should be of a form which only produces values in the range 0 to 1.

Usually this probability distribution has a support from minus infinity to plus infinity so that any finite value of η is transformed by the function g to a value inside the range 0 to 1.

In the case of probit, the link is the cdf of the normal distribution.

The linear probability model is not a proper binomial regression specification because predictions need not be in the range of zero to one; it is sometimes used for this type of data when the probability space is where interpretation occurs or when the analyst lacks sufficient sophistication to fit or calculate approximate linearizations of probabilities for interpretation.

A binary choice model assumes a latent variable Un, the utility (or net benefit) that person n obtains from taking an action (as opposed to not taking the action).

The unobserved term, εn, is assumed to have a logistic distribution.

The specification is written succinctly as: Let us write it slightly differently: Here we have made the substitution en = −εn.

This changes a random variable into a slightly different one, defined over a negated domain.

The variance of ϵ can not be identified and when it is not of interest is often assumed to be equal to one.

If ϵ is uniformly distributed, then a linear probability model is appropriate.