Discriminative model

They are typically used to solve binary classification problems, i.e. assign labels, such as pass/fail, win/lose, alive/dead or healthy/sick, to existing datapoints.

Types of discriminative models include logistic regression (LR), conditional random fields (CRFs), decision trees among many others.

Generative model approaches which uses a joint probability distribution instead, include naive Bayes classifiers, Gaussian mixture models, variational autoencoders, generative adversarial networks and others.

Unlike generative modelling, which studies the joint probability

dependent on the observed variables (training samples).

Within a probabilistic framework, this is done by modeling the conditional probability distribution

to simulate the behavior of what we observed from the training data-set by the linear classifier method.

, the decision function is defined as: According to Memisevic's interpretation,[2]

, computes a score which measures the compatibility of the input

Since the 0-1 loss function is a commonly used one in the decision theory, the conditional probability distribution

is a parameter vector for optimizing the training data, could be reconsidered as following for the logistics regression model: The equation above represents logistic regression.

Notice that a major distinction between models is their way of introducing posterior probability.

Posterior probability is inferred from the parametric model.

We then can maximize the parameter by following equation: It could also be replaced by the log-loss equation below: Since the log-loss is differentiable, a gradient-based method can be used to optimize the model.

A global optimum is guaranteed because the objective function is convex.

The above method will provide efficient computation for the relative small number of classification.

A generative model takes the joint probability

[3] Discriminative models, as opposed to generative models, do not allow one to generate samples from the joint distribution of observed and target variables.

However, for tasks such as classification and regression that do not require the joint distribution, discriminative models can yield superior performance (in part because they have fewer variables to compute).

In addition, most discriminative models are inherently supervised and cannot easily support unsupervised learning.

Application-specific details ultimately dictate the suitability of selecting a discriminative versus generative model.

[6] To maintain the least expected loss, the minimization of result's misclassification should be acquired.

, is inferred from a parametric model, where the parameters come from the training data.

On the other hand, considering that the generative models focus on the joint probability, the class posterior possibility

is considered in Bayes' theorem, which is In the repeated experiments, logistic regression and naive Bayes are applied here for different models on binary classification task, discriminative learning results in lower asymptotic errors, while generative one results in higher asymptotic errors faster.

[3] However, in Ulusoy and Bishop's joint work, Comparison of Generative and Discriminative Techniques for Object Detection and Classification, they state that the above statement is true only when the model is the appropriate one for data (i.e.the data distribution is correctly modeled by the generative model).

Similarly, Kelm[8] also proposed the combination of two modelings for pixel classification in his article Combining Generative and Discriminative Methods for Pixel Classification with Multi-Conditional Learning.

During the process of extracting the discriminative features prior to the clustering, Principal component analysis (PCA), though commonly used, is not a necessarily discriminative approach.

[9] Linear discriminant analysis (LDA), provides an efficient way of eliminating the disadvantage we list above.

As we know, the discriminative model needs a combination of multiple subtasks before classification, and LDA provides appropriate solution towards this problem by reducing dimension.