Error tolerance (PAC learning)

In PAC learning, error tolerance refers to the ability of an algorithm to learn when the examples received have been corrupted in some way.

In fact, this is a very common and important issue since in many applications it is not possible to access noise-free data.

Noise can interfere with the learning process at different levels: the algorithm may receive data that have been occasionally mislabeled, or the inputs may have some false information, or the classification of the examples may have been maliciously adulterated.

be a class of functions that we wish to use in order to learn a

be an oracle that, whenever called, returns an example

When no noise corrupts the data, we can define learning in the Valiant setting:[1][2] Definition: We say that

in the Valiant setting if there exists a learning algorithm

it outputs, in a number of calls to the oracle bounded by

that returns always the correct label of example

As in the Valiant case, the goal of a learning algorithm

In applications it is difficult to have access to the real value of

, then learning becomes impossible in any amount of computation time, because every label conveys no information about the target function.

in the classification noise model if there exists a learning algorithm

it outputs, in a number of calls to the oracle bounded by

can decide if to request information about the likelihood

, and receives an answer accurate within a tolerance

the following hold: Note that the confidence parameter

is to allow the learning algorithm a small probability of failure due to an unrepresentative sample.

always guarantees to meet the approximation criterion

, the failure probability is no longer needed.

The statistical query model is strictly weaker than the PAC model: any efficiently SQ-learnable class is efficiently PAC learnable in the presence of classification noise, but there exist efficient PAC-learnable problems such as parity that are not efficiently SQ-learnable.

[8] In the malicious classification model[9] an adversary generates errors to foil the learning algorithm.

This setting describes situations of error burst, which may occur when for a limited time transmission equipment malfunctions repeatedly.

that returns a correctly labeled example

an example drawn from a distribution that is not related to

Moreover, this maliciously chosen example may strategically selected by an adversary who has knowledge of

in the malicious classification model, if there exist a learning algorithm

it outputs, in a number of calls to the oracle bounded by

In the nonuniform random attribute noise[10][11] model the algorithm is learning a Boolean function, a malicious oracle

This type of error can irreparably foil the algorithm, in fact the following theorem holds: In the nonuniform random attribute noise setting, an algorithm