Triangulation (computer vision)

In computer vision, triangulation refers to the process of determining a point in 3D space given its projections onto two, or more, images.

In practice, however, the coordinates of image points cannot be measured with arbitrary accuracy.

Instead, various types of noise, such as geometric noise from lens distortion or interest point detection error, lead to inaccuracies in the measured image coordinates.

As a consequence, the lines generated by the corresponding image points do not always intersect in 3D space.

In the following, it is assumed that triangulation is made on corresponding image points from two views generated by pinhole cameras.

The image to the left illustrates the epipolar geometry of a pair of stereo cameras of pinhole model.

Using basic linear algebra that intersection point can be determined in a straightforward way.

The reason is a combination of factors such as As a consequence, the measured image points are

However, their projection lines (blue) do not have to intersect in 3D space or come close to x.

satisfy the epipolar constraint defined by the fundamental matrix.

it is rather likely that the epipolar constraint is not satisfied and the projection lines do not intersect.

In the following sections, some of the various methods for computing xest presented in the literature are briefly described.

, that is, when the epipolar constraint is satisfied (except for singular points, see below).

are the homogeneous coordinates of the detected image points and

Which triangulation method is chosen for a particular problem depends to some extent on these characteristics.

Some of the methods fail to correctly compute an estimate of x (3D point) if it lies in a certain subset of the 3D space, corresponding to some combination of

The reason for the failure can be that some equation system to be solved is under-determined or that the projective representation of xest becomes the zero vector for the singular points.

In some applications, it is desirable that the triangulation is independent of the coordinate system used to represent 3D points; if the triangulation problem is formulated in one coordinate system and then transformed into another the resulting estimate xest should transform in the same way.

Not every triangulation method assures invariance, at least not for general types of coordinate transformations.

If the homogeneous coordinates are transformed according to then the camera matrices must transform as (Ck) to produce the same homogeneous image coordinates (yk) If the triangulation function

is only an abstract representation of a computation which, in practice, may be relatively complex.

which is a closed-form continuous function while others need to be decomposed into a series of computational steps involving, for example, SVD or finding the roots of a polynomial.

This means that both the computation time and the complexity of the operations involved may vary between the different methods.

has a corresponding projection line (blue in the right image above), here denoted as

The midpoint method finds the point xest which minimizes It turns out that xest lies exactly at the middle of the shortest line segment which joins the two projection lines.

If the essential matrix is known and the corresponding rotation and translation transformations have been determined, this algorithm (described in Longuet-Higgins' paper) provides a solution.

according to In the ideal case, when the camera maps the 3D points according to a perfect pinhole camera and the resulting 2D points can be detected without any noise, the two expressions for

Again, in the ideal case the result should be equal to the above expressions, but in practice they may deviate.

A final remark relates to the fact that if the essential matrix is determined from corresponding image coordinate, which often is the case when 3D points are determined in this way, the translation vector

As a consequence, the reconstructed 3D points, too, are undetermined with respect to a positive scaling.

The ideal case of epipolar geometry. A 3D point x is projected onto two camera images through lines (green) which intersect with each camera's focal point, O 1 and O 2 . The resulting image points are y 1 and y 2 . The green lines intersect at x .
In practice, the image points y 1 and y 2 cannot be measured with arbitrary accuracy. Instead points y' 1 and y' 2 are detected and used for the triangulation. The corresponding projection lines (blue) do not, in general, intersect in 3D space and may also not intersect with point x .