Histogram of oriented gradients

The histogram of oriented gradients (HOG) is a feature descriptor used in computer vision and image processing for the purpose of object detection.

This method is similar to that of edge orientation histograms, scale-invariant feature transform descriptors, and shape contexts, but differs in that it is computed on a dense grid of uniformly spaced cells and uses overlapping local contrast normalization for improved accuracy.

[2] However, usage only became widespread in 2005 when Navneet Dalal and Bill Triggs, researchers for the French National Institute for Research in Computer Science and Automation (INRIA), presented their supplementary work on HOG descriptors at the Conference on Computer Vision and Pattern Recognition (CVPR).

[3] The first step of calculation in many feature detectors in image pre-processing is to ensure normalized color and gamma values.

The most common method is to apply the 1-D centered, point discrete derivative mask in one or both of the horizontal and vertical directions.

Each pixel within the cell casts a weighted vote for an orientation-based histogram bin based on the values found in the gradient computation.

[5] To account for changes in illumination and contrast, the gradient strengths must be locally normalized, which requires grouping the cells together into larger, spatially connected blocks.

The HOG descriptor is then the concatenated vector of the components of the normalized cell histograms from all of the block regions.

The R-HOG blocks appear quite similar to the scale-invariant feature transform (SIFT) descriptors; however, despite their similar formation, R-HOG blocks are computed in dense grids at some single scale without orientation alignment, whereas SIFT descriptors are usually computed at sparse, scale-invariant key image points and are rotated to align orientation.

In addition, the R-HOG blocks are used in conjunction to encode spatial form information, while SIFT descriptors are used singly.

Finally, shape contexts use circular bins, similar to those used in C-HOG blocks, but only tabulate votes on the basis of edge presence, making no distinction with regards to orientation.

As part of the Pascal Visual Object Classes 2006 Workshop, Dalal and Triggs presented results on applying histogram of oriented gradients descriptors to image objects other than humans, such as cars, buses, and bicycles, as well as common animals such as dogs, cats, and cows.

[10] As part of the 2006 European Conference on Computer Vision (ECCV), Dalal and Triggs teamed up with Cordelia Schmid to apply HOG detectors to the problem of human detection in films and videos.

These internal motion histograms use the gradient magnitudes from optical flow fields obtained from two consecutive frames.

When testing on two large datasets taken from several movies, the combined HOG-IMH method yielded a miss rate of approximately 0.1 at a

[11] At the Intelligent Vehicles Symposium in 2006, F. Suard, A. Rakotomamonjy, and A. Bensrhair introduced a complete system for pedestrian detection based on HOG descriptors.

Then support vector machine classifiers operate on the HOG descriptors taken from these smaller positions of interest to formulate a decision regarding the presence of a pedestrian.

[12] At the IEEE Conference on Computer Vision and Pattern Recognition in 2006, Qiang Zhu, Shai Avidan, Mei-Chen Yeh, and Kwang-Ting Cheng presented an algorithm to significantly speed up human detection using HOG descriptor methods.

Their method uses HOG descriptors in combination with the cascading classifiers algorithm normally applied with great success to face detection.

The resulting gradient field HOG (GF-HOG) descriptor captured local spatial structure in sketches or image edge maps.

This enabled the descriptor to be used within a content-based image retrieval system searchable by free-hand sketched shapes.

[14] The GF-HOG adaptation was shown to outperform existing gradient histogram descriptors such as SIFT, SURF, and HOG by around 15 percent at the task of SBIR.

[16] Instead of image gradients he used distances between points (pixels) and planes, so called residuals, to characterize a local region in a pointcloud.