AdaBoost

AdaBoost (short for Adaptive Boosting) is a statistical classification meta-algorithm formulated by Yoav Freund and Robert Schapire in 1995, who won the 2003 Gödel Prize for their work.

Usually, AdaBoost is presented for binary classification, although it can be generalized to multiple classes or bounded intervals of real values.

Although AdaBoost is typically used to combine weak base learners (such as decision stumps), it has been shown to also effectively combine strong base learners (such as deeper decision trees), producing an even more accurate model.

[3] Every learning algorithm tends to suit some problem types better than others, and typically has many different parameters and configurations to adjust before it achieves optimal performance on a dataset.

AdaBoost (with decision trees as the weak learners) is often referred to as the best out-of-the-box classifier.

[4][5] When used with decision tree learning, information gathered at each stage of the AdaBoost algorithm about the relative 'hardness' of each training sample is fed into the tree-growing algorithm such that later trees tend to focus on harder-to-classify examples.

as input and returns a value indicating the class of the object.

For example, in the two-class problem, the sign of the weak learner's output identifies the predicted object class and the absolute value gives the confidence in that classification.

is assigned to each sample in the training set equal to the current error

For instance, decision trees can be grown which favor the splitting of sets of samples with large weights.

, though it can be a good starting guess in other cases, such as when the weak learner is biased (

as precisely as possible without loss of generalization, typically using least square error

Thus it can be seen that the weight update in the AdaBoost algorithm is equivalent to recalculating the error on

As long as the loss function is monotonic and continuously differentiable, the classifier is always driven toward purer solutions.

), unlike unmodified least squares, and only penalises samples misclassified with confidence greater than 1 linearly, as opposed to quadratically or exponentially, and is thus less susceptible to the effects of outliers.

in n-dimensional space, where each axis corresponds to a training sample, each weak learner

corresponds to a vector of fixed orientation and length, and the goal is to reach the target point

to minimize test error) or Newton (choose some target point, find

(typically chosen using weighted least squares error): Thus, rather than multiplying the output of the entire tree by some fixed value, each leaf node is changed to output half the logit transform of its previous value.

LogitBoost represents an application of established logistic regression techniques to the AdaBoost method.

becomes very small and the z term, which is large for misclassified samples, can become numerically unstable, due to machine precision rounding errors.

Thus, in the case where a weak learner exhibits perfect classification performance, GentleBoost chooses

Empirical observations about the good performance of GentleBoost appear to back up Schapire and Singer's remark that allowing excessively large values of

[9][10] A technique for speeding up processing of boosted classifiers, early termination refers to only testing each potential object with as many layers of the final classifier necessary to meet some confidence threshold, speeding up computation for cases where the class of the object can easily be determined.

One such scheme is the object detection framework introduced by Viola and Jones:[11] in an application with significantly more negative samples than positive, a cascade of separate boost classifiers is trained, the output of each stage biased such that some acceptably small fraction of positive samples is mislabeled as negative, and all samples marked as negative after each stage are discarded.

If 50% of negative samples are filtered out by each stage, only a very small number of objects would pass through the entire classifier, reducing computation effort.

This method has since been generalized, with a formula provided for choosing optimal thresholds at each stage to achieve some desired false positive and false negative rate.

[12] In the field of statistics, where AdaBoost is more commonly applied to problems of moderate dimensionality, early stopping is used as a strategy to reduce overfitting.

The simplest methods, which can be particularly effective in conjunction with totally corrective training, are weight- or margin-trimming: when the coefficient, or the contribution to the total test error, of some weak classifier falls below a certain threshold, that classifier is dropped.

Margineantu & Dietterich[15] suggested an alternative criterion for trimming: weak classifiers should be selected such that the diversity of the ensemble is maximized.