The ugly duckling theorem is an argument showing that classification is not really possible without some sort of bias.
More particularly, it assumes finitely many properties combinable by logical connectives, and finitely many objects; it asserts that any two different objects share the same number of (extensional) properties.
The theorem is named after Hans Christian Andersen's 1843 story "The Ugly Duckling", because it shows that a duckling is just as similar to a swan as two swans are to each other.
[1]: 376–377 Suppose there are n things in the universe, and one wants to put them into classes or categories.
One has no preconceived ideas or biases about what sorts of categories are "natural" or "normal" and what are not.
So one has to consider all the possible classes that could be, all the possible ways of making a set out of the n objects.
such ways, the size of the power set of n objects.
One can use that to measure the similarity between two objects, and one would see how many sets they have in common.
As all possible choices of zeros and ones are there, any two bit-positions will agree exactly half the time.
One may pick two elements and reorder the bits so they are the first two, and imagine the numbers sorted lexicographically.
or on half of all the cases, no matter which two elements one picks.
The number of predicates simultaneously satisfied by two non-identical elements is constant over all such pairs.
Thus, some kind of inductive[citation needed] bias is needed to make judgements to prefer certain categories over others.
However, the choice of boolean features to consider could have been somewhat arbitrary.
The only canonical way to do this is to extend it with all possible Boolean functions.
The ugly duckling theorem states that there is no ugly duckling because any two completed vectors will either be equal or differ in exactly half of the features.
variables over GF(2), segregate the functions into pairs
A possible way around the ugly duckling theorem would be to introduce a constraint on how similarity is measured by limiting the properties involved in classification, for instance, between A and B.
However Medin et al. (1993) point out that this does not actually resolve the arbitrariness or bias problem since in what respects A is similar to B: "varies with the stimulus context and task, so that there is no unique answer, to the question of how similar is one object to another".
Of course, if these feature weights were fixed, then these similarity relations would be constrained".
Yet the property "striped" as a weight 'fix' or constraint is arbitrary itself, meaning: "unless one can specify such criteria, then the claim that categorization is based on attribute matching is almost entirely vacuous".
Stamos (2003) remarked that some judgments of overall similarity are non-arbitrary in the sense they are useful: "Presumably, people's perceptual and conceptual processes have evolved that information that matters to human needs and goals can be roughly approximated by a similarity heuristic...
If you are in the jungle and you see a tiger but you decide not to stereotype (perhaps because you believe that similarity is a false friend), then you will probably be eaten.
In other words, in the biological world stereotyping based on veridical judgments of overall similarity statistically results in greater survival and reproductive success.
"[6]Unless some properties are considered more salient, or 'weighted' more important than others, everything will appear equally similar, hence Watanabe (1986) wrote: "any objects, in so far as they are distinguishable, are equally similar".
[7] In a weaker setting that assumes infinitely many properties, Murphy and Medin (1985) give an example of two putative classified things, plums and lawnmowers: "Suppose that one is to list the attributes that plums and lawnmowers have in common in order to judge their similarity.
It is easy to see that the list could be infinite: Both weigh less than 10,000 kg (and less than 10,001 kg), both did not exist 10,000,000 years ago (and 10,000,001 years ago), both cannot hear well, both can be dropped, both take up space, and so on.
Likewise, the list of differences could be infinite… any two entities can be arbitrarily similar or dissimilar by changing the criterion of what counts as a relevant attribute.
"[8]According to Woodward,[9] the ugly duckling theorem is related to Schaffer's Conservation Law for Generalization Performance, which states that all algorithms for learning of boolean functions from input/output examples have the same overall generalization performance as random guessing.
[10] The latter result is generalized by Woodward to functions on countably infinite domains.