One-shot learning (computer vision)

One-shot learning is an object categorization problem, found mostly in computer vision.

The ability to learn object categories from few examples, and at a rapid pace, has been demonstrated in humans.

[1][2] It is estimated that a child learns almost all of the 10 ~ 30 thousand object categories in the world by age six.

Given two examples from two object categories: one, an unknown object composed of familiar shapes, the second, an unknown, amorphous shape; it is much easier for humans to recognize the former than the latter, suggesting that humans make use of previously learned categories when learning new ones.

The Bayesian one-shot learning algorithm represents the foreground and background of images as parametrized by a mixture of constellation models.

For object recognition on new images, the posterior obtained during the learning phase is used in a Bayesian decision framework to estimate the ratio of p(object | test, train) to p(background clutter | test, train) where p is the probability of the outcome.

To compute these probabilities, the object class must be modeled from a set of (1 ~ 5) training images containing examples.

We next introduce parametric models for the foreground and background categories with parameters

yields The posterior distribution of model parameters given the training images,

, first a set of N interesting regions is detected in the image using the Kadir–Brady saliency detector.

, the collection of part locations) and appearance are independent allows one to consider the likelihood expression

in the constellation model has a Gaussian density within this space with mean and precision parameters

From these the appearance likelihood described above is computed as a product of Gaussians over the model parts for a give hypothesis h and mixture component

and hypothesis h is represented as a joint Gaussian density of the locations of features.

These features are transformed into a scale and translation-invariant space before modelling the relative location of the parts by a 2(P - 1)-dimensional Gaussian.

, only those hypotheses that satisfy the ordering constraint that the x-coordinate of each part is monotonically increasing are considered.

However, because in one-shot learning, few training examples are used, the distribution will not be well-peaked, as is assumed in a

Thus instead of this traditional approximation, the Bayesian one-shot learning algorithm seeks to "find a parametric form of

is a product of Gaussians, as chosen in the object category model, the integral reduces to a multivariate Student's T distribution, which can be evaluated.

[22] To obtain shape and appearance priors, three categories (spotted cats, faces, and airplanes) are learned using maximum likelihood estimation.

These object category model parameters are then used to estimate the hyper-parameters of the desired priors.

Given a set of training examples, the algorithm runs the feature detector on these images, and determines model parameters from the salient regions.

The hypothesis index h assigning features to parts prevents a closed-form solution of the linear model, so the posterior

is estimated by variational Bayesian expectation–maximization algorithm, which is run until parameter convergence after ~ 100 iterations.

Learning a category in this fashion takes under a minute on a 2.8 GHz machine with a 4-part model and < 10 training images.

is the binary random variable defined by the values of a particular pixel p across all of the images,

[25] To use this model for classification, it must be estimated with the maximum posterior probability given an observed image

, the test image I is inserted into the training ensemble for the congealing process.

obtained from congealing many images of a certain category, the classifier can be extended to the case where only one training

This artificial data set can be made larger by borrowing transformations from many already known categories.