The model learned by the standard perceptron algorithm is a linear binary classifier: a vector of weights w (and optionally an intercept term b, omitted here for simplicity) that is used to classify a sample vector x as class "one" or class "minus one" according to where a zero is arbitrarily mapped to one or minus one.
In pseudocode, the perceptron algorithm is given by: By contrast with the linear models learned by the perceptron, a kernel method[3] is a classifier that stores a subset of its training examples xi, associates with each a weight αi, and makes decisions for new samples x' by evaluating Here, K is some kernel function.
To derive a kernelized version of the perceptron algorithm, we must first formulate it in dual form, starting from the observation that the weight vector w can be expressed as a linear combination of the n training samples.
We must also rewrite the prediction formula to get rid of w: Plugging these two equations into the training loop turn it into the dual perceptron algorithm.
[6] The sequential minimal optimization (SMO) algorithm used to learn support vector machines can also be regarded as a generalization of the kernel perceptron.