Probit model

[1] The purpose of the model is to estimate the probability that an observation with particular characteristics will fall into a specific one of the categories; moreover, classifying observations based on their predicted probabilities is a type of binary classification model.

As such it treats the same set of problems as does logistic regression using similar techniques.

Suppose a response variable Y is binary, that is it can have only two possible outcomes which we will denote as 1 and 0.

Specifically, we assume that the model takes the form where P is the probability and

The parameters β are typically estimated by maximum likelihood.

Suppose there exists an auxiliary random variable where ε ~ N(0, 1).

Then Y can be viewed as an indicator for whether this latent variable is positive: The use of the standard normal distribution causes no loss of generality compared with the use of a normal distribution with an arbitrary mean and standard deviation, because adding a fixed amount to the mean can be compensated by subtracting the same amount from the intercept, and multiplying the standard deviation by a fixed amount can be compensated by multiplying the weights by the same amount.

To see that the two models are equivalent, note that Suppose data set

which maximizes this function will be consistent, asymptotically normal and efficient provided that

, and therefore standard numerical algorithms for optimization will converge rapidly to the unique maximum.

is the Probability Density Function (PDF) of standard normal distribution.

Semi-parametric and non-parametric maximum likelihood methods for probit-type and other related models are also available.

: It can be shown that this estimator is consistent (as n→∞ and T fixed), asymptotically normal and efficient.

[citation needed] Its advantage is the presence of a closed-form formula for the estimator.

However, it is only meaningful to carry out this analysis when individual observations are not available, only their aggregated counts

Gibbs sampling of a probit model is possible with the introduction of normally distributed latent variables z, which are observed as 1 if positive and 0 otherwise.

This approach was introduced in Albert and Chib (1993),[5] which demonstrated how Gibbs sampling could be applied to binary and polychotomous response models within a Bayesian framework.

Under a multivariate normal prior distribution over the weights, the model can be described as From this, Albert and Chib (1993)[5] derive the following full conditional distributions in the Gibbs sampling algorithm: The result for

is given in the article on Bayesian linear regression, although specified with different notation, while the conditional posterior distributions of the latent variables follow a truncated normal distribution within the given ranges.

Thus, knowledge of the observed outcomes serves to restrict the support of the latent variables.

For sampling the latent variables from the truncated normal posterior distributions, one can take advantage of the inverse-cdf method, implemented in the following R vectorized function, making it straightforward to implement the method.

The suitability of an estimated binary model can be evaluated by counting the number of true observations equaling 1, and the number equaling zero, for which the model assigns a correct predicted classification by treating any estimated probability above 1/2 (or, below 1/2), as an assignment of a prediction of 1 (or, of 0).

To deal with this problem, the original model needs to be transformed to be homoskedastic.

is normally distributed fails to hold, then a functional form misspecification issue arises: if the model is still estimated as a probit model, the estimators of the coefficients

follows a logistic distribution in the true model, but the model is estimated by probit, the estimates will be generally smaller than the true value.

The cost is heavier computation and lower accuracy for the increase of the number of parameter.

[7] In most of the cases in practice where the distribution form is misspecified, the estimators for the coefficients are inconsistent, but estimators for the conditional probability and the partial effects are still very good.

[citation needed] One can also take semi-parametric or non-parametric approaches, e.g., via local-likelihood or nonparametric quasi-likelihood methods, which avoid assumptions on a parametric form for the index function and is robust to the choice of the link function (e.g., probit or logit).

[9] However, the basic model dates to the Weber–Fechner law by Gustav Fechner, published in Fechner (1860), and was repeatedly rediscovered until the 1930s; see Finney (1971, Chapter 3.6) and Aitchison & Brown (1957, Chapter 1.2).

[9] A fast method for computing maximum likelihood estimates for the probit model was proposed by Ronald Fisher as an appendix to Bliss' work in 1935.