Statistical classification

When classification is performed by a computer, statistical methods are normally used to develop the algorithm.

Often, the individual observations are analyzed into a set of quantifiable properties, known variously as explanatory variables or features.

The term "classifier" sometimes also refers to the mathematical function, implemented by a classification algorithm, that maps input data to a category.

Other fields may use different terminology: e.g. in community ecology, the term "classification" normally refers to cluster analysis.

However, such an algorithm has numerous advantages over non-probabilistic classifiers: Early work on statistical classification was undertaken by Fisher,[1][2] in the context of two-group problems, leading to Fisher's linear discriminant function as the rule for assigning a group to a new observation.

The extension of this same context to more than two groups has also been considered with a restriction imposed that the classification rule should be linear.

[3][4] Later work for the multivariate normal distribution allowed the classifier to be nonlinear:[5] several classification rules can be derived based on different adjustments of the Mahalanobis distance, with a new observation being assigned to the group whose centre has the lowest adjusted distance from the observation.

[7] Some Bayesian procedures involve the calculation of group-membership probabilities: these provide a more informative outcome than a simple attribution of a single group-label to each new observation.

Features may variously be binary (e.g. "on" or "off"); categorical (e.g. "A", "B", "AB" or "O", for blood type); ordinal (e.g. "large", "medium" or "small"); integer-valued (e.g. the number of occurrences of a particular word in an email); or real-valued (e.g. a measurement of blood pressure).

The most commonly used include:[9] Choices between different possible algorithms are frequently made on the basis of quantitative evaluation of accuracy.