Stability (learning theory)

Stability, also known as algorithmic stability, is a notion in computational learning theory of how a machine learning algorithm output is changed with small perturbations to its inputs.

A stable learning algorithm is one for which the prediction does not change much when the training data is modified slightly.

For instance, consider a machine learning algorithm that is being trained to recognize handwritten letters of the alphabet, using 1000 examples of handwritten letters and their labels ("A" to "Z") as a training set.

A stable learning algorithm would produce a similar classifier with both the 1000-element and 999-element training sets.

The study of stability gained importance in computational learning theory in the 2000s when it was shown to have a connection with generalization.

[1] It was shown that for large classes of learning algorithms, notably empirical risk minimization algorithms, certain types of stability ensure good generalization.

A central goal in designing a machine learning system is to guarantee that the learning algorithm will generalize, or perform accurately on new examples after being trained on a finite number of them.

In the 1990s, milestones were reached in obtaining generalization bounds for supervised learning algorithms.

The technique historically used to prove generalization was to show that an algorithm was consistent, using the uniform convergence properties of empirical quantities to their means.

This technique was used to obtain generalization bounds for the large class of empirical risk minimization (ERM) algorithms.

An ERM algorithm is one that selects a solution from a hypothesis space

in such a way to minimize the empirical error on a training set

A general result, proved by Vladimir Vapnik for an ERM binary classification algorithms, is that for any target function and input distribution, any hypothesis space

The result was later extended to almost-ERM algorithms with function classes that do not have unique minimizers.

Vapnik's work, using what became known as VC theory, established a relationship between generalization of a learning algorithm and properties of the hypothesis space

However, these results could not be applied to algorithms with hypothesis spaces of unbounded VC-dimension.

Put another way, these results could not be applied when the information being learned had a complexity that was too large to measure.

Some of the simplest machine learning algorithms—for instance, for regression—have hypothesis spaces with unbounded VC-dimension.

Another example is language learning algorithms that can produce sentences of arbitrary length.

Stability analysis was developed in the 2000s for computational learning theory and is an alternative method for obtaining generalization bounds.

, and it can be assessed in algorithms that have hypothesis spaces with unbounded or undefined VC-dimension such as nearest neighbor.

A measure of Leave one out error is used in a Cross Validation Leave One Out (CVloo) algorithm to evaluate a learning algorithm's stability with respect to the loss function.

We define several terms related to learning algorithms training sets, so that we can then define stability in multiple ways and present theorems from the field.

has hypothesis stability β with respect to the loss function V if the following holds:

has point-wise hypothesis stability β with respect to the loss function V if the following holds:

has error stability β with respect to the loss function V if the following holds:

has uniform stability β with respect to the loss function V if the following holds:

has CVloo stability β with respect to the loss function V if the following holds:

From Mukherjee et al. (06): This is an important result for the foundations of learning theory, because it shows that two previously unrelated properties of an algorithm, stability and consistency, are equivalent for ERM (and certain loss functions).

This is a list of algorithms that have been shown to be stable, and the article where the associated generalization bounds are provided.