Note that this is caused by transductive inference on different test sets producing mutually inconsistent predictions.
[1] An example of learning which is not inductive would be in the case of binary classification, where the inputs tend to cluster in two groups.
A large set of test inputs may help in finding the clusters, thus providing useful information about the classification labels.
The same predictions would not be obtainable from a model which induces a function based only on the training cases.
If exact inference is computationally prohibitive, one may at least try to make sure that the approximations are good at the test inputs.
Bruno de Finetti developed a purely subjective form of Bayesianism in which claims about objective chances could be translated into empirically respectable claims about subjective credences with respect to observables through exchangeability properties.
With this problem, however, the supervised learning algorithm will only have five labeled points to use as a basis for building a predictive model.
In this case, transductive algorithms would label the unlabeled points according to the clusters to which they naturally belong.
A supervised learning algorithm, on the other hand, can label new points instantly, with very little computational cost.
Max flow min cut partitioning schemes are very popular for this purpose.