Neighbourhood components analysis is a supervised learning method for classifying multivariate data into distinct classes according to a given distance metric over the data.
Functionally, it serves the same purposes as the K-nearest neighbors algorithm and makes direct use of a related concept termed stochastic nearest neighbours.
Neighbourhood components analysis aims at "learning" a distance metric by finding a linear transformation of input data such that the average leave-one-out (LOO) classification performance is maximized in the transformed space.
The key insight to the algorithm is that a matrix
corresponding to the transformation can be found by defining a differentiable objective function for
, followed by the use of an iterative solver such as conjugate gradient descent.
One of the benefits of this algorithm is that the number of classes
This use of the algorithm, therefore, addresses the issue of model selection.
, we define an objective function describing classification accuracy in the transformed space and try to determine
Consider predicting the class label of a single data point by consensus of its
can be quite different after passing all the points through a linear transformation.
Specifically, the set of neighbours for a point can undergo discrete changes in response to smooth changes in the elements of
based on the neighbours of a point will be piecewise-constant, and hence not differentiable.
We can resolve this difficulty by using an approach inspired by stochastic gradient descent.
-nearest neighbours at each transformed point in LOO-classification, we'll consider the entire transformed data set as stochastic nearest neighbours.
We define these using a softmax function of the squared Euclidean distance between a given LOO-classification point and each other point in the transformed space:
The probability of correctly classifying data point
is the probability of classifying the points of each of its neighbours with the same class
Define the objective function using LOO classification, this time using the entire data set as stochastic nearest neighbours:
Note that under stochastic nearest neighbours, the consensus class for a single point
is the expected value of a point's class in the limit of an infinite number of samples drawn from the distribution over its neighbours
Thus the predicted class is an affine combination of the classes of every other point, weighted by the softmax function for each
is now the entire transformed data set.
This choice of objective function is preferable as it is differentiable with respect to
means that it can be found with an iterative solver such as conjugate gradient descent.
Note that in practice, most of the innermost terms of the gradient evaluate to insignificant contributions due to the rapidly diminishing contribution of distant points from the point of interest.
This means that the inner sum of the gradient can be truncated, resulting in reasonable computation times even for large data sets.
A natural alternative is the KL-divergence, which induces the following objective function and gradient:" (Goldberger 2005)
using this function tends to give similar performance results as with the original.
Neighbourhood components analysis was developed by Jacob Goldberger, Sam Roweis, Ruslan Salakhudinov, and Geoff Hinton at the University of Toronto's department of computer science in 2004.