Generative model

Terminology is inconsistent,[a] but three major types can be distinguished, following Jebara (2004): The distinction between these last two classes is not consistently made;[4] Jebara (2004) refers to these three classes as generative learning, conditional learning, and discriminative learning, but Ng & Jordan (2002) only distinguish two classes, calling them generative classifiers (joint distribution) and discriminative classifiers (conditional distribution or no distribution), not distinguishing between the latter two classes.

Standard examples of each, all of which are linear classifiers, are: In application to classification, one wishes to go from an observation x to a label y (or probability distribution on labels).

(discriminative model), and base classification on that; or one can estimate the joint distribution

(generative model), from that compute the conditional probability

These are increasingly indirect, but increasingly probabilistic, allowing more domain knowledge and probability theory to be applied.

In practice different approaches are used, depending on the particular problem, and hybrids can combine strengths of multiple approaches.

An alternative division defines these symmetrically as: Regardless of precise definition, the terminology is constitutional because a generative model can be used to "generate" random instances (outcomes), either of an observation and target

The term "generative model" is also used to describe models that generate instances of output variables in a way that has no clear relationship to probability distributions over potential samples of input variables.

Generative adversarial networks are examples of this class of generative models, and are judged primarily by the similarity of particular outputs to potential inputs.

In application to classification, the observable X is frequently a continuous variable, the target Y is generally a discrete variable consisting of a finite set of labels, and the conditional probability

Given a finite set of labels, the two definitions of "generative model" are closely related.

Thus, while a model of the joint probability distribution is more informative than a model of the distribution of label (but without their relative frequencies), it is a relatively small step, hence these are not always distinguished.

(considering X as continuous, hence integrating over it, and Y as discrete, hence summing over it), and either conditional distribution can be computed from the definition of conditional probability:

, one can estimate the opposite conditional probability using Bayes' rule: For example, given a generative model for

, one can estimate: Note that Bayes' rule (computing one conditional probability in terms of the other) and the definition of conditional probability (computing conditional probability in terms of the joint distribution) are frequently conflated as well.

A discriminative algorithm does not care about how the data was generated, it simply categorizes a given signal.

On the other hand, it has been proved that some discriminative algorithms give better performance than some generative algorithms in classification tasks.

[6] Despite the fact that discriminative models do not need to model the distribution of the observed variables, they cannot generally express complex relationships between the observed and target variables.

But in general, they don't necessarily perform better than generative models at classification and regression tasks.

The two classes are seen as complementary or as different views of the same procedure.

[7] With the rise of deep learning, a new family of methods, called deep generative models (DGMs),[8][9] is formed through the combination of generative models and deep neural networks.

An increase in the scale of the neural networks is typically accompanied by an increase in the scale of the training data, both of which are required for good performance.

[10] Popular DGMs include variational autoencoders (VAEs), generative adversarial networks (GANs), and auto-regressive models.

Recently, there has been a trend to build very large deep generative models.

[8] For example, GPT-3, and its precursor GPT-2,[11] are auto-regressive neural language models that contain billions of parameters, BigGAN[12] and VQ-VAE[13] which are used for image generation that can have hundreds of millions of parameters, and Jukebox is a very large generative model for musical audio that contains billions of parameters.

[14] Types of generative models are: If the observed data are truly sampled from the generative model, then fitting the parameters of the generative model to maximize the data likelihood is a common method.

However, since most statistical models are only approximations to the true distribution, if the model's application is to infer about a subset of variables conditional on known values of others, then it can be argued that the approximation makes more assumptions than are necessary to solve the problem at hand.

In such cases, it can be more accurate to model the conditional density functions directly using a discriminative model (see below), although application-specific details will ultimately dictate which approach is most suitable in any particular case.

For the above data, estimating the joint probability distribution

will be following: Shannon (1948) gives an example in which a table of frequencies of English word pairs is used to generate a sentence beginning with "representing and speedily is an good"; which is not proper English but which will increasingly approximate it as the table is moved from word pairs to word triplets etc.