k-d tree

K-dimensional is that which concerns exactly k orthogonal axes or a space of any number of dimensions.

[2] Every non-leaf node can be thought of as implicitly generating a splitting hyperplane that divides the space into two parts, known as half-spaces.

In such a case, the hyperplane would be set by the x value of the point, and its normal would be the unit x-axis.

The canonical method of k-d tree construction has the following constraints:[4] This method leads to a balanced k-d tree, in which each leaf node is approximately the same distance from the root.

In the case where median points are not selected, there is no guarantee that the tree will be balanced.

In some cases, it is acceptable to let points equal to the median lie on one side of the median, for example, by splitting the points into a "lesser than" subset and a "greater than or equal to" subset.

Then, they maintain the order of the presort during tree construction and hence eliminate the costly step of finding the median at each level of subdivision.

Two such algorithms build a balanced k-d tree to sort triangles in order to improve the execution time of ray tracing for three-dimensional computer graphics.

[7][8] An algorithm that builds a balanced k-d tree to sort points has a worst-case complexity of

It then maintains the order of these k presorts during tree construction and thereby avoids finding the median at each level of subdivision.

To remove a point from an existing k-d tree without breaking the invariant, the easiest way is to form the set of all nodes and leaves from the children of the target node and recreate that part of the tree.

Balancing a k-d tree requires care because k-d trees are sorted in multiple dimensions, so the tree-rotation technique cannot be used to balance them as this may break the invariant.

Additionally, it can save computation by holding the squared current best distance in a variable for comparison.

It can provide the k nearest neighbours to a point by maintaining k current bests instead of just one.

For example, approximate nearest neighbour searching can be achieved by simply setting an upper bound on the number of points to examine in the tree or by interrupting the search process based upon a real time clock (which may be more appropriate in hardware implementations).

The nearest neighbour for points that are already in the tree can be achieved by not updating the refinement for nodes that give zero distance.

Approximate nearest neighbour is useful in real-time applications such as robotics due to the significant speed increase gained by not searching for the best point exhaustively.

Analyses of binary search trees has found that the worst case time for range search in a k-dimensional k-d tree containing n nodes is given by the following equation.

operation on average, in the case of randomly distributed points, although analysis in general is tricky.

Otherwise, when k-d trees are used with high-dimensional data, most of the points in the tree will be evaluated and the efficiency is no better than exhaustive search,[15] and, if a good-enough fast answer is required, approximate nearest-neighbour methods should be used instead.

Additionally, even in low-dimensional space, if the average pairwise distance between the k nearest neighbors of the query point is significantly less than the average distance between the query point and each of the k nearest neighbors, the performance of nearest neighbor search degrades towards linear, since the distances from the query point to each nearest neighbor are of similar magnitude.

(In the worst case, consider a cloud of points distributed on the surface of a sphere centered at the origin.

Every point is equidistant from the origin, so a search for the nearest neighbor from the origin would have to iterate through all points on the surface of the sphere to identify the nearest neighbor – which in this case is not even unique.)

To mitigate the potentially significant performance degradation of a k-d tree search in the worst case, a maximum distance parameter can be provided to the tree search algorithm, and the recursive search can be pruned whenever the closest point in a given branch of the tree cannot be closer than this maximum distance.

In an orthogonal range search, the opposite coordinate is used when comparing against the median.

For example, if the current level is split along xhigh, we check the xlow coordinate of the search rectangle.

It is also possible to define a k-d tree with points stored solely in leaves.

The midpoint splitting rule[18] selects on the middle of the longest axis of the space being searched, regardless of the distribution of points.

This guarantees that the aspect ratio will be at most 2:1, but the depth is dependent on the distribution of points.

Maneewongvatana and Mount show that this offers "good enough" performance on common data sets.

Example of a nearest neighbor search in a 2-d tree. Here, the tree is already built, each node corresponds to a rectangle, each rectangle is split in two equal subrectangles, and leaves correspond to rectangles containing a single point