In PAC learning, error tolerance refers to the ability of an algorithm to learn when the examples received have been corrupted in some way.
In fact, this is a very common and important issue since in many applications it is not possible to access noise-free data.
Noise can interfere with the learning process at different levels: the algorithm may receive data that have been occasionally mislabeled, or the inputs may have some false information, or the classification of the examples may have been maliciously adulterated.
be a class of functions that we wish to use in order to learn a
be an oracle that, whenever called, returns an example
When no noise corrupts the data, we can define learning in the Valiant setting:[1][2] Definition: We say that
in the Valiant setting if there exists a learning algorithm
it outputs, in a number of calls to the oracle bounded by
that returns always the correct label of example
As in the Valiant case, the goal of a learning algorithm
In applications it is difficult to have access to the real value of
, then learning becomes impossible in any amount of computation time, because every label conveys no information about the target function.
in the classification noise model if there exists a learning algorithm
it outputs, in a number of calls to the oracle bounded by
can decide if to request information about the likelihood
, and receives an answer accurate within a tolerance
the following hold: Note that the confidence parameter
is to allow the learning algorithm a small probability of failure due to an unrepresentative sample.
always guarantees to meet the approximation criterion
, the failure probability is no longer needed.
The statistical query model is strictly weaker than the PAC model: any efficiently SQ-learnable class is efficiently PAC learnable in the presence of classification noise, but there exist efficient PAC-learnable problems such as parity that are not efficiently SQ-learnable.
[8] In the malicious classification model[9] an adversary generates errors to foil the learning algorithm.
This setting describes situations of error burst, which may occur when for a limited time transmission equipment malfunctions repeatedly.
that returns a correctly labeled example
an example drawn from a distribution that is not related to
Moreover, this maliciously chosen example may strategically selected by an adversary who has knowledge of
in the malicious classification model, if there exist a learning algorithm
it outputs, in a number of calls to the oracle bounded by
In the nonuniform random attribute noise[10][11] model the algorithm is learning a Boolean function, a malicious oracle
This type of error can irreparably foil the algorithm, in fact the following theorem holds: In the nonuniform random attribute noise setting, an algorithm