Sequential minimal optimization

Sequential minimal optimization (SMO) is an algorithm for solving the quadratic programming (QP) problem that arises during the training of support-vector machines (SVM).

[1] SMO is widely used for training support vector machines and is implemented by the popular LIBSVM tool.

A soft-margin support vector machine is trained by solving a quadratic programming problem, which is expressed in the dual form as follows: where C is an SVM hyperparameter and K(xi, xj) is the kernel function, both supplied by the user; and the variables

SMO breaks this problem into a series of smallest possible sub-problems, which are then solved analytically.

The algorithm proceeds as follows: When all the Lagrange multipliers satisfy the KKT conditions (within a user-defined tolerance), the problem has been solved.

The first approach to splitting large SVM learning problems into a series of smaller optimization tasks was proposed by Bernhard Boser, Isabelle Guyon, Vladimir Vapnik.

The algorithm starts with a random subset of the data, solves this problem, and iteratively adds examples which violate the optimality conditions.

On real world sparse data sets, SMO can be more than 1000 times faster than the chunking algorithm.

[1] In 1997, E. Osuna, R. Freund, and F. Girosi proved a theorem which suggests a whole new set of QP algorithms for SVMs.

A sequence of QP sub-problems that always add at least one violator of the Karush–Kuhn–Tucker (KKT) conditions is guaranteed to converge.

They are iterative methods where each step projects the current primal point onto each constraint.

Graph of a strictly concave quadratic function with unique maximum.
Optimization computes maxima and minima.