Multiple instance learning

MIL deals with problems with incomplete knowledge of labels in training sets.

More precisely, in multiple-instance learning, the training set consists of labeled "bags", each of which is a collection of unlabeled instances.

The actual term multi-instance learning was introduced in the middle of the 1990s, by Dietterich et al. while they were investigating the problem of drug activity prediction.

Molecules can have many alternative low-energy states, but only one, or some of them, are qualified to make a drug.

The problem arose because scientists could only determine if molecule is qualified, or not, but they couldn't say exactly which of its low-energy shapes are responsible for that.

One of the proposed ways to solve this problem was to use supervised learning, and regard all the low-energy shapes of the qualified molecule as positive training instances, while all of the low-energy shapes of unqualified molecules as negative instances.

Solution to the multiple instance learning problem that Dietterich et al. proposed is the axis-parallel rectangle (APR) algorithm.

In 1998, Maron and Ratan found another application of multiple instance learning to scene classification in machine vision, and devised Diverse Density framework.

An image is labeled positive if it contains the target scene - a waterfall, for example - and negative otherwise.

From there on, these frameworks have been applied to a wide spectrum of applications, ranging from image concept learning and text categorization, to stock market prediction.

For instance, the target class might be "beach", where the image contains both "sand" and "water".

Examples of where MIL is applied are: Numerous researchers have worked on adapting classical classification techniques, such as support vector machines or boosting, to work within the context of multiple-instance learning.

Most of the work on multiple instance learning, including Dietterich et al. (1997) and Maron & Lozano-Pérez (1997) early papers,[3][9] make the assumption regarding the relationship between the instances within a bag and the class label of the bag.

Guided by that idea, Weidmann [11] formulated a hierarchy of generalized instance-based assumptions for MIL.

The count-based assumption is a final generalization which enforces both lower and upper bounds for the number of times a required concept can occur in a positively labeled bag.

[8] However, Scott et al. describe a further generalization in which there is a set of attraction points

A bag is labeled positive if and only if it contains instances which are sufficiently close to at least

The goal of an algorithm operating under the collective assumption is then to model the distribution

is typically considered fixed but unknown, algorithms instead focus on computing the empirical version:

is also typically taken to be fixed but unknown, most collective-assumption based methods focus on learning this distribution, as in the single-instance version.

This process is repeated until the APR covers at least one instance from each positive bag.

After the first phase, the APR is thought to tightly contain only the representative attributes.

[8] In its simplest form, Diverse Density (DD) assumes a single representative instance

[9] Though Diverse Density was originally proposed by Maron et al. in 1998, more recent MIL algorithms use the DD framework, such as EM-DD in 2001 [13] and DD-SVM in 2004,[14] and MILES in 2006 [8] A number of single-instance algorithms have also been adapted to a multiple-instance context under the standard assumption, including Post 2000, there was a movement away from the standard assumption and the development of algorithms designed to tackle the more general assumptions listed above.

GMIL-2 then maps each bag to a Boolean vector, as in GMIL-1, but only considers APRs corresponding to unique subsets of the candidate representative instances.

Future bags are simply mapped (embedded) into the feature space of metadata and labeled by the chosen classifier.

Therefore, much of the focus for metadata-based algorithms is on what features or what type of embedding leads to effective classification.

Note that some of the previously mentioned algorithms, such as TLC and GMIL could be considered metadata-based.

They define two variations of kNN, Bayesian-kNN and citation-kNN, as adaptations of the traditional nearest-neighbor problem to the multiple-instance setting.

So far this article has considered multiple instance learning exclusively in the context of binary classifiers.