Classification rule

Given a population whose members each belong to one of a number of different sets or classes, a classification rule or classifier is a procedure by which the elements of the population set are each predicted to belong to one of the classes.

[1] A perfect classification is one for which every element in the population is assigned to the class it really belongs to.

Given a data set consisting of pairs x and y, where x denotes an element of the population and y the class it belongs to, a classification rule h(x) is a function that assigns each element x to a predicted class

The true labels yi can be known but will not necessarily match their approximations

A computer classifier can be able to learn or can implement static classification rules.

For a training data-set, the true labels yj are unknown, but it is a prime target for the classification procedure that the approximation

as well as possible, where the quality of this approximation needs to be judged on the basis of the statistical or probabilistic properties of the overall population from which future observations will be drawn.

In binary classification, a better understood task, only two classes are involved, whereas multiclass classification involves assigning an object to one of several classes.

An important point is that in many practical binary classification problems, the two groups are not symmetric – rather than overall accuracy, the relative proportion of different types of errors is of interest.

For example, in medical testing, a false positive (detecting a disease when it is not present) is considered differently from a false negative (not detecting a disease when it is present).

In multiclass classifications, the classes may be considered symmetrically (all errors are equivalent), or asymmetrically, which is considerably more complicated.

Using the dot locations, we can build a confusion matrix to express the values.

There is true positive (TP), false positive (FP), false negative (FN), and true negative (TN).

False negative is commonly denoted as the bottom left (Condition positive X test outcome negative) unit in a Confusion matrix.

This is shown to be true when the patient test confirms the existence of the disease.

This is shown to be true when the patients test also reports not having the disease.

Using Bayes' theorem will help describe the Probability of an Event (probability theory), based on prior knowledge of conditions that might be related to the event.

Naively, one might think that only 5% of positive test results are false, but that is quite wrong, as we shall see.

Despite the apparent high accuracy of the test, the incidence of the disease is so low that the vast majority of patients who test positive do not have the disease.

Thus the test is not useless, and re-testing may improve the reliability of the result.

In order to reduce the problem of false positives, a test should be very accurate in reporting a negative result when the patient does not have the disease.

When a disease is rare, false negatives will not be a major problem with the test.

But if 60% of the population had the disease, then the probability of a false negative would be greater.

In training a classifier, one may wish to measure its performance using the well-accepted metrics of sensitivity and specificity.

Suppose then that we have a random classifier that guesses that the patient has the disease with that same probability

An alternative measure of performance is the Matthews correlation coefficient, for which any random classifier will get an average score of 0.

The extension of this concept to non-binary classifications yields the confusion matrix.