In machine learning, one-class classification (OCC), also known as unary classification or class-modelling, tries to identify objects of a specific class amongst all objects, by primarily learning from a training set containing only the objects of that class,[1] although there exist variants of one-class classifiers where counter-examples are used to further refine the classification boundary.
Examples include the monitoring of helicopter gearboxes,[2][3][4] motor failure prediction,[5] or the operational status of a nuclear plant as 'normal':[6] In this scenario, there are few, if any, examples of catastrophic system states; only the statistics of normal operation are known.
While many of the above approaches focus on the case of removing a small number of outliers or anomalies, one can also learn the other extreme, where the single class covers a small coherent subset of the data, using an information bottleneck approach.
[9] SVM based one-class classification (OCC) relies on identifying the smallest hypersphere (with radius r, and center c) consisting of all the data points.
[10] This method is called Support Vector Data Description (SVDD).
Formally, the problem can be defined in the following constrained optimization form,
However, the above formulation is highly restrictive, and is sensitive to the presence of outliers.
The introduction of kernel function provide additional flexibility to the One-class SVM (OSVM) algorithm.
This contrasts with other forms of semisupervised learning, where it is assumed that a labeled set containing examples of both classes is available in addition to unlabeled samples.
A variety of techniques exist to adapt supervised classifiers to the PU learning setting, including variants of the EM algorithm.
PU learning has been successfully applied to text,[13][14][15] time series,[16] bioinformatics tasks,[17][18] and remote sensing data.
[19] Several approaches have been proposed to solve one-class classification (OCC).
Gaussian model[20] is one of the simplest methods to create one-class classifiers.
Due to Central Limit Theorem (CLT),[21] these methods work best when large number of samples are present, and they are perturbed by small independent error values.
Boundary methods rely on distances, and hence are not robust to scale variance.
K-centers method, NN-d, and SVDD are some of the key examples.
small balls with equal radius are placed to minimize the maximum distance of all minimum distances between training objects and the centers.
The algorithm uses forward search method with random initialization, where the radius is determined by the maximum distance of the object, any given ball should capture.
Some examples of reconstruction methods for OCC are, k-means clustering, learning vector quantization, self-organizing maps, etc.
The basic Support Vector Machine (SVM) paradigm is trained using both positive and negative examples, however studies have shown there are many valid reasons for using only positive examples.
When the SVM algorithm is modified to only use positive examples, the process is considered one-class classification.
One situation where this type of classification might prove useful to the SVM paradigm is in trying to identify a web browser's sites of interest based only off of the user's browsing history.
One-class classification can be particularly useful in biomedical studies where often data from other classes can be difficult or impossible to obtain.
In studying biomedical data it can be difficult and/or expensive to obtain the set of labeled data from the second class that would be necessary to perform a two-class classification.
A study from The Scientific World Journal found that the typicality approach is the most useful in analysing biomedical data because it can be applied to any type of dataset (continuous, discrete, or nominal).
[25] To apply typicality to one-class classification for biomedical studies, each new observation,
[24] One-class classification has similarities with unsupervised concept drift detection, where both aim to identify whether the unseen data share similar characteristics to the initial data.
A concept is referred to as the fixed probability distribution which data is drawn from.
Unseen data is classified as typical or outlier depending on its characteristics, whether it is from the initial concept or not.
Unsupervised concept drift detection can be identified as the continuous form of one-class classification.