Linear predictor function

In statistics and in machine learning, a linear predictor function is a linear function (linear combination) of a set of coefficients and explanatory variables (independent variables), whose value is used to predict the outcome of a dependent variable.

, for k = 1, ..., p, is the value of the k-th explanatory variable for data point i, and

indicating the relative effect of a particular explanatory variable on the outcome.

It is common to write the predictor function in a more compact form as follows: This makes it possible to write the linear predictor function as follows: using the notation for a dot product between two vectors.

In some models (standard linear regression, in particular), the equations for each of the data points i = 1, ..., n are stacked together and written in vector form as where The matrix X is known as the design matrix and encodes all known information about the independent variables.

are random variables, which in standard linear regression are distributed according to a standard normal distribution; they express the influence of any unknown factors on the outcome.

This makes it possible to find optimal coefficients through the method of least squares using simple matrix operations.

The use of the matrix inverse in this formula requires that X is of full rank, i.e. there is not perfect multicollinearity among different explanatory variables (i.e. no explanatory variable can be perfectly predicted from the others).

In such cases, the singular value decomposition can be used to compute the pseudoinverse.

An example is polynomial regression, which uses a linear predictor function to fit an arbitrary degree polynomial relationship (up to a given order) between two sets of data points (i.e. a single real-valued explanatory variable and a related real-valued dependent variable), by adding multiple explanatory variables corresponding to various powers of the existing explanatory variable.

Mathematically, the form looks like this: In this case, for each data point i, a set of explanatory variables is created as follows: and then standard linear regression is run.

All sorts of non-linear functions of the explanatory variables can be fit by the model.

There is no particular need for the inputs to basis functions to be univariate or single-dimensional (or their outputs, for that matter, although in such a case, a K-dimensional output value is likely to be treated as K separate scalar-output basis functions).

An example of this is radial basis functions (RBF's), which compute some transformed version of the distance to some fixed point: An example is the Gaussian RBF, which has the same functional form as the normal distribution: which drops off rapidly as the distance from c increases.

A possible usage of RBF's is to create one for every observed data point.

That is, the application of the radial basis functions will pick out the nearest point, and its regression coefficient will dominate.

The result will be a form of nearest neighbor interpolation, where predictions are made by simply using the prediction of the nearest observed data point, possibly interpolating between multiple nearby data points when they are all similar distances away.

This type of nearest neighbor method for prediction is often considered diametrically opposed to the type of prediction used in standard linear regression: But in fact, the transformations that can be applied to the explanatory variables in a linear predictor function are so powerful that even the nearest neighbor method can be implemented as a type of linear regression.

Linear regression and similar techniques could be applied and will often still find the optimal coefficients, but their error estimates and such will be wrong.

The explanatory variables may be of any type: real-valued, binary, categorical, etc.

The main distinction is between continuous variables (e.g. income, age, blood pressure, etc.)

and discrete variables (e.g. sex, race, political party, etc.).

This allows for separate regression coefficients to be matched for each possible value of the discrete variable.

This causes problems for a number of methods, such as the simple closed-form solution used in linear regression.

The solution is either to avoid such cases by eliminating one of the dummy variables, and/or introduce a regularization constraint (which necessitates a more powerful, typically iterative, method for finding the optimal coefficients).