Semi-global matching

Semi-global matching (SGM) is a computer vision algorithm for the estimation of a dense disparity map from a rectified stereo image pair, introduced in 2005 by Heiko Hirschmüller while working at the German Aerospace Center.

[1] Given its predictable run time, its favourable trade-off between quality of the results and computing time, and its suitability for fast parallel implementation in ASIC or FPGA, it has encountered wide adoption in real-time stereo vision applications such as robotics and advanced driver assistance systems.

Given a rectified stereo image pair, for a pixel with coordinates

[1] A simple search for the best matching pixel produces many spurious matches, and this problem can be mitigated with the addition of a regularisation term that penalises jumps in disparity between adjacent pixels, with a cost function in the form where

Such constraint can be efficiently enforced on a per-scanline basis by using dynamic programming (e.g. the Viterbi algorithm), but such limitation can still introduce streaking artefacts in the depth map, because little or no regularisation is performed across scanlines.

[4] A possible solution is to perform global optimisation in 2D, which is however an NP-complete problem in the general case.

For some families of cost functions (e.g. submodular functions) a solution with strong optimality properties can be found in polynomial time using graph cut optimization, however such global methods are generally too expensive for real-time processing.

[5] The idea behind SGM is to perform line optimisation along multiple directions and computing an aggregated cost

The number of directions affects the run time of the algorithm, and while 16 directions usually ensure good quality, a lower number can be used to achieve faster execution.

[6] A typical 8-direction implementation of the algorithm can compute the cost in two passes, a forward pass accumulating the cost from the left, top-left, top, and top-right, and a backward pass accumulating the cost from right, bottom-right, bottom, and bottom-left.

The former can be in principle any local image dissimilarity measure, and commonly used functions are absolute or squared intensity difference (usually summed over a window around the pixel, and after applying a high-pass filter to the images to gain some illumination invariance), Birchfield–Tomasi dissimilarity, Hamming distance of the census transform, Pearson correlation (normalized cross-correlation).

Even mutual information can be approximated as a sum over the pixels, and thus used as a local similarity metric.

The three-way comparison allows to assign a smaller penalty for unitary changes in disparity, thus allowing smooth transitions corresponding e.g. to slanted surfaces, and penalising larger jumps while preserving discontinuities due to the constant penalty term.

Each term can be expressed recursively as where the minimum cost at the previous pixel

is subtracted for numerical stability, since it is constant for all values of disparity at the current pixel and therefore it does not affect the optimisation.

, and sub-pixel accuracy can be achieved by fitting a curve in

and its neighbouring costs and taking the minimum along the curve.

Since the two images in the stereo pair are not treated symmetrically in the calculations, a consistency check can be performed by computing the disparity a second time in the opposite direction, swapping the role of the left and right image, and invalidating the result for the pixels where the result differs between the two calculations.

Further post-processing techniques for the refinement of the disparity image include morphological filtering to remove outliers, intensity consistency checks to refine textureless regions, and interpolation to fill in pixels invalidated by consistency checks.

times, therefore the computational complexity of the algorithm for an image of size

[7] The main drawback of SGM is its memory consumption.

An implementation of the two-pass 8-directions version of the algorithm requires to store

and to compute the cost for a pixel during each pass it is necessary to keep track of the

[7] One solution to reduce memory consumption is to compute SGM on partially overlapping image tiles, interpolating the values over the overlapping regions.

This method also allows to apply SGM to very large images, that would not fit within memory in the first place.

[12] A memory-efficient approximation of SGM stores for each pixel only the costs for the disparity values that represent a minimum along some direction, instead of all possible disparity values.

The true minimum is highly likely to be predicted by the minima along the eight directions, thus yielding similar quality of the results.

The algorithm uses eight directions and three passes, and during the first pass it stores for each pixel the cost for the optimal disparity along the four top-down directions, plus the two closest lower and higher values (for sub-pixel interpolation).

In the second pass, the other four bottom-up directions are computed, completing the calculations for the four disparity values selected in the first pass, that now have been evaluated along all eight directions.

, at the cost in time of an additional pass over the image.