Binary classification

Binary classification is the task of classifying the elements of a set into one of two groups (each called class).

But in the real world often one of the two classes is more important, so that the number of both of the different types of errors is of interest.

These can be arranged into a 2×2 contingency table, with rows corresponding to actual value – condition positive or condition negative – and columns corresponding to classification value – test outcome positive or test outcome negative.

There is also no general agreement on how the pair of indicators should be used to decide on concrete questions, such as when to prefer one classifier over another.

Some of the methods commonly used for binary classification are: Each classifier is best in only a select domain based upon the number of observations, the dimensionality of the feature vector, the noise in the data and many other factors.

In such cases, the designation of the test of being either positive or negative gives the appearance of an inappropriately high certainty, while the value is in fact in an interval of uncertainty.

For example, with the urine concentration of hCG as a continuous value, a urine pregnancy test that measured 52 mIU/ml of hCG may show as "positive" with 50 mIU/ml as cutoff, but is in fact in an interval of uncertainty, which may be apparent only by knowing the original continuous value.

In this set of tested instances, the instances left of the divider have the condition being tested; the right half do not. The oval bounds those instances that a test algorithm classifies as having the condition. The green areas highlight the instances that the test algorithm correctly classified. Labels refer to:
TP=true positive; TN=true negative; FP=false positive (type I error); FN=false negative (type II error); TPR=set of instances to determine true positive rate; FPR=set of instances to determine false positive rate; PPV=positive predictive value; NPV=negative predictive value.