However, they do not take the 3D geometric constraints of the object into consideration during matching, and typically also do not handle occlusion as well as feature-based approaches.
Due to lack of the appropriate feature detectors, objects without textured, smooth surfaces cannot currently be handled by this approach.
Feature-based object recognizers generally work by pre-capturing a number of fixed views of the object to be recognized, extracting features from these views, and then in the recognition process, matching these features to the scene and enforcing geometric constraints.
As an example of a prototypical system taking this approach, we will present an outline of the method used by [Rothganger et al. 2004], with some detail elided.
Because smooth surfaces are locally planar, affine invariant features are appropriate for matching: the paper detects ellipse-shaped regions of interest using both edge-like and blob-like features, and as per [Lowe 2004], finds the dominant gradient direction of the ellipse, converts the ellipse into a parallelogram, and takes a SIFT descriptor on the resulting parallelogram.
At the end of this step, one has a model of the target object, consisting of features projected into a common 3D space.