Shape context is a feature descriptor used in object recognition.
Serge Belongie and Jitendra Malik proposed the term in their paper "Matching with Shape Contexts" in 2000.
[1] The basic idea is to pick n points on the contours of a shape.
The set of all these vectors is a rich description of the shape localized at that point but is far too detailed.
The key idea is that the distribution over relative positions is a robust, compact, and highly discriminative descriptor.
The fact that the shape context is a rich and discriminative descriptor can be seen in the figure below, in which the shape contexts of two different versions of the letter "A" are shown.
(c) is the diagram of the log-polar bins used to compute the shape context.
In particular it needs to be invariant to translation, scaling, small perturbations, and, depending on the application, rotation.
[1][4] Shape contexts are empirically demonstrated to be robust to deformations, noise, and outliers[4] using synthetic point set matching experiments.
[5] One can provide complete rotational invariance in shape contexts.
But of course this is not always desired since some local features lose their discriminative power if not measured relative to the same frame.
Many applications in fact forbid rotational invariance e.g. distinguishing a "6" from a "9".
A complete system that uses shape contexts for shape matching consists of the following steps (which will be covered in more detail in the Details of Implementation section): The approach assumes that the shape of an object is essentially captured by a finite subset of the points on the internal or external contours on the object.
It is preferable to sample the shape with roughly uniform spacing, though it is not critical.
Consider two points p and q that have normalized K-bin histograms (i.e. shape contexts) g(k) and h(k).
As shape contexts are distributions represented as histograms, it is natural to use the χ2 test statistic as the "shape context cost" of matching the two points: The values of this range from 0 to 1.
For instance, it could be a measure of tangent angle dissimilarity (particularly useful in digit recognition): This is half the length of the chord in unit circle between the unit vectors with angles
[6] To have robust handling of outliers, one can add "dummy" nodes that have a constant but reasonably large cost of matching to the cost matrix.
The bending energy (a measure of how much transformation is needed to align the points) will also be easily obtained.
The TPS formulation above has exact matching requirement for the pairs of points on the two shapes.
denote the target function values at corresponding locations
However, if one uses the original non-normalized coordinates, then the regularization parameter needs to be normalized.
If we iterate the steps of finding correspondences and estimating transformations (i.e. repeating steps 2–5 with the newly transformed shape) we can overcome this problem.
Typically, three iterations are all that is needed to obtain reasonable results.
This distance is going to be a weighted sum of three potential terms: Shape context distance: this is the symmetric sum of shape context matching costs over best matching points: where T(·) is the estimated TPS transform that maps the points in Q to those in P. Appearance cost: After establishing image correspondences and properly warping one image to match the other, one can define an appearance cost as the sum of squared brightness differences in Gaussian windows around corresponding image points: where
The authors Serge Belongie and Jitendra Malik tested their approach on the MNIST database.
[10] The authors experimented with the MPEG-7 shape silhouette database, performing Core Experiment CE-Shape-1 part B, which measures performance of similarity-based retrieval.
For this experiment, the authors increased the number of points sampled from each shape.
The authors also developed an editing algorithm based on shape context similarity and k-medoid clustering that improved on their performance.
No visually similar trademark was missed by the algorithm (verified manually by the authors).