Perspective-n-Point[1] is the problem of estimating the pose of a calibrated camera given a set of n 3D points in the world and their corresponding 2D projections in the image.
This problem originates from camera calibration and has many applications in computer vision and other areas, including 3D pose estimation, robotics and augmented reality.
Given a set of n 3D points in a world reference frame and their corresponding 2D image projections as well as the calibrated intrinsic camera parameters, determine the 6 DOF pose of the camera in the form of its rotation and translation with respect to the world.
are the desired 3D rotation and 3D translation of the camera (extrinsic parameters) that are being calculated.
This leads to the following equation for the model: There are a few preliminary aspects of the problem that are common to all solutions of PnP.
[4] or the Direct Linear Transform (DLT) applied to the projection model, are exceptions to this assumption as they estimate these intrinsic parameters as well as the extrinsic parameters which make up the pose of the camera that the original PnP problem is trying to find.
For each solution to PnP, the chosen point correspondences cannot be colinear.
RANSAC is also commonly used with a PnP method to make the solution robust to outliers in the set of point correspondences.
This following section describes two common methods that can be used to solve the PnP problem that are also readily available in open source software and how RANSAC can be used to deal with outliers in the data set.
When n = 3, the PnP problem is in its minimal form of P3P and can be solved with three point correspondences.
However, with just three point correspondences, P3P yields up to four real, geometrically feasible solutions.
For low noise levels a fourth correspondence can be used to remove ambiguity.
This forms triangles PBC, PAC, and PAB from which we obtain a sufficient equation system for P3P:
[5] A recent algorithm for solving the problem as well as a solution classification for it is given in the 2003 IEEE Transactions on Pattern Analysis and Machine Intelligence paper by Gao, et al.[6] An open source implementation of Gao's P3P solver can be found in OpenCV's calib3d module in the solvePnP function.
[7] Several faster and more accurate versions have been published since, including Lambda Twist P3P[8] which achieved state of the art performance in 2018 with a 50 fold increase in speed and a 400 fold decrease in numerical failures.
Efficient PnP (EPnP) is a method developed by Lepetit, et al. in their 2008 International Journal of Computer Vision paper[9] that solves the general problem of PnP for n ≥ 4.
It is from these control points that the final pose of the camera is solved for.
As an overview of the process, first note that each of the n reference points in the world frame,
respectively, and the weights are normalized per reference point as shown below.
The solution for the control points exists in the null space of M and is expressed as where
The R and T matrices that minimize the reprojection error of the world reference points,
complexity and works in the general case of PnP for both planar and non-planar control points.
Open source software implementations of this method can be found in OpenCV's Camera Calibration and 3D Reconstruction module in the solvePnP function[7] as well as from the code published by Lepetit, et al. at their website, CVLAB at EPFL.
[10] This method is not robust against outliers and generally compares poorly to RANSAC P3P followed by nonlinear refinement [citation needed].
[11] It is a non-minimal, non-polynomial solver which casts PnP as a non-linear quadratic program.
SQPnP identifies regions in the parameter space of 3D rotations (i.e., the 8-sphere) that contain unique minima with guarantees that at least one of them is the global one.
Each regional minimum is computed with sequential quadratic programming that is initiated at nearest orthogonal approximation matrices.
SQPnP has similar or even higher accuracy compared to state of the art polynomial solvers, is globally optimal and computationally very efficient, being practically linear in the number of supplied points n. A C++ implementation is available on GitHub, which has also been ported to OpenCV and included in the Camera Calibration and 3D Reconstruction module (SolvePnP function).
[12] PnP is prone to errors if there are outliers in the set of point correspondences.
An open source implementation of PnP methods with RANSAC can be found in OpenCV's Camera Calibration and 3D Reconstruction module in the solvePnPRansac function.