Camera matrix

be a representation of a 3D point in homogeneous coordinates (a 4-dimensional vector), and let

be a representation of the image of this point in the pinhole camera (a 3-dimensional vector).

sign implies that the left and right hand sides are equal except for a multiplication by a non-zero scalar

This means that it has only 11 degrees of freedom since any multiplication by a non-zero scalar results in an equivalent camera matrix.

The mapping from the coordinates of a 3D point P to the 2D image coordinates of the point's projection onto the image plane, according to the pinhole camera model, is given by where

are the resulting image coordinates, and f is the camera's focal length for which we assume f > 0.

To derive the camera matrix, the expression above is rewritten in terms of homogeneous coordinates.

Finally, also the 3D coordinates are expressed in a homogeneous representation

The camera matrix derived here may appear trivial in the sense that it contains very few non-zero elements.

This depends to a large extent on the particular coordinate systems which have been chosen for the 3D and 2D points.

In practice, however, other forms of camera matrices are common, as will be shown below.

derived in the previous section has a null space which is spanned by the vector This is also the homogeneous representation of the 3D point which has coordinates (0,0,0), that is, the "camera center" (aka the entrance pupil; the position of the pinhole of a pinhole camera) is at O.

The camera matrix derived above can be simplified even further if we assume that f = 1: where

So far all points in the 3D world have been represented in a camera centered coordinate system, that is, a coordinate system which has its origin at the camera center (the location of the pinhole of a pinhole camera).

Assuming that the camera coordinate axes (X1, X2, X3) and the axes (X1', X2', X3') are of Euclidean type (orthogonal and isotropic), there is a unique Euclidean 3D transformation (rotation and translation) between the two coordinate systems.

In other words, the camera is not necessarily at the origin looking along the z axis.

The two operations of rotation and translation of 3D coordinates can be represented as the two

When the first matrix is multiplied onto the homogeneous representation of a 3D point, the result is the homogeneous representation of the rotated point, and the second matrix performs instead a translation.

Performing the two operations in sequence, i.e. first the rotation and then the translation (with translation vector given in the already rotated coordinate system), gives a combined rotation and translation matrix Assuming that

are precisely the rotation and translations which relate the two coordinate system (X1,X2,X3) and (X1',X2',X3') above, this implies that where

is the homogeneous representation of the point P in the coordinate system (X1',X2',X3').

This type of camera matrix is referred to as a normalized camera matrix, it assumes focal length = 1 and that image coordinates are measured in a coordinate system where the origin is located at the intersection between axis X3 and the image plane and has the same units as the 3D coordinate system.

Again, the null space of the normalized camera matrix,

described above, is spanned by the 4-dimensional vector This is also, again, the coordinates of the camera center, now relative to the (X1',X2',X3') system.

This can be seen by applying first the rotation and then the translation to the 3-dimensional vector

This implies that the camera center (in its homogeneous representation) lies in the null space of the camera matrix, provided that it is represented in terms of 3D coordinates relative to the same coordinate system as the camera matrix refers to.

Given the mapping produced by a normalized camera matrix, the resulting normalized image coordinates can be transformed by means of an arbitrary 2D homography.

This includes 2D translations and rotations as well as scaling (isotropic and anisotropic) but also general 2D perspective transformations.

which maps the homogeneous normalized image coordinates

: Inserting the above expression for the normalized image coordinates in terms of the 3D coordinates gives This produces the most general form of camera matrix