Category utility

It attempts to maximize both the probability that two objects in the same category have attribute values in common, and the probability that objects from different categories have different attribute values.

It was intended to supersede more limited measures of category goodness such as "cue validity" (Reed 1972; Rosch & Mervis 1975) and "collocation index" (Jones 1983).

It provides a normative information-theoretic measure of the predictive advantage gained by the observer who possesses knowledge of the given category structure (i.e., the class labels of instances) over the observer who does not possess knowledge of the category structure.

In this sense the motivation for the category utility measure is similar to the information gain metric used in decision tree learning.

A review of category utility in its probabilistic incarnation, with applications to machine learning, is provided in Witten & Frank (2005, pp. 260–262).

The probability-theoretic definition of category utility given in Fisher (1987) and Witten & Frank (2005) is as follows: where

The motivation and development of this expression for category utility, and the role of the multiplicand

is the expected number of attribute values that can be correctly guessed by an observer using a probability-matching strategy together with knowledge of the category labels, while

is the expected number of attribute values that can be correctly guessed by an observer the same strategy but without any knowledge of the category labels.

Their difference therefore reflects the relative advantage accruing to the observer by having knowledge of the category structure.

The information-theoretic definition of category utility for a set of entities with size-

represents the cost (in bits) of optimally encoding (or transmitting) feature information when it is known that the objects to be described belong to category

represents the cost (in bits) of optimally encoding (or transmitting) feature information when it is known that the objects to be described belong to category

distinct values (which need not be ordered; all variables can be nominal); for the special case

In terms of the conditional probabilities this can be re-written (or defined) as If the original definition of the category utility from above is rewritten with

in the category utility equation runs over independent binary variables

Similarly, a feature variable adopting values {1,2,3,4,5} is not qualitatively different from a feature variable adopting values {fred,joe,bob,sue,elaine}.

One possible adjustment for this insensitivity to ordinality is given by the weighting scheme described in the article for mutual information.

At least since the time of Aristotle there has been a tremendous fascination in philosophy with the nature of concepts and universals.

The question of locus was an important issue on which the classical schools of Plato and Aristotle famously differed.

In the late Middle Ages (perhaps beginning with Occam, although Porphyry also makes a much earlier remark indicating a certain discomfort with the status quo), however, the certainty that existed on this issue began to erode, and it became acceptable among the so-called nominalists and empiricists to consider concepts and universals as strictly mental entities or conventions of language.

On this view of concepts—that they are purely representational constructs—a new question then comes to the fore: "Why do we possess one set of concepts rather than another?"

This is a question that modern philosophers, and subsequently machine learning theorists and cognitive scientists, have struggled with for many decades.

One approach to answering such questions is to investigate the "role" or "purpose" of concepts in cognition.

by Mill (1843, p. 425) and many others is that classification (conception) is a precursor to induction: By imposing a particular categorization on the universe, an organism gains the ability to deal with physically non-identical objects or situations in an identical fashion, thereby gaining substantial predictive leverage (Smith & Medin 1981; Harnad 2005).

466–468), The general problem of classification... [is] to provide that things shall be thought of in such groups, and those groups in such an order, as will best conduce to the remembrance and to the ascertainment of their laws... [and] one of the uses of such a classification that by drawing attention to the properties on which it is founded, and which, if the classification be good, are marks of many others, it facilitates the discovery of those others.From this base, Mill reaches the following conclusion, which foreshadows much subsequent thinking about category goodness, including the notion of category utility: The ends of scientific classification are best answered when the objects are formed into groups respecting which a greater number of general propositions can be made, and those propositions more important, than could be made respecting any other groups into which the same things could be distributed.

A variety of different measures have been suggested with an aim of formally capturing this notion of "category goodness," the best known of which is probably the "cue validity".

, or as the deviation of the conditional probability from the category base rate (Edgell 1993;Kruschke & Johansen 1999),

Also, while the cue validity was originally intended to account for the demonstrable appearance of basic categories in human cognition—categories of a particular level of generality that are evidently preferred by human learners—a number of major flaws in the cue validity quickly emerged in this regard (Jones 1983;Murphy 1982;Corter & Gluck 1992, and others).

The category utility was introduced as a more sophisticated refinement of the cue validity, which attempts to more rigorously quantify the full inferential power of a class structure.