PH-tree

The PH-tree[1] is a tree data structure used for spatial indexing of multi-dimensional data (keys) such as geographical coordinates, points, feature vectors, rectangles or bounding boxes.

The PH-tree is space partitioning index[2] with a structure similar to that of a quadtree or octree.

The bit-based splitting policy, when combined with the use of different internal representations for nodes, provides scalability with high-dimensional data.

The bit-representation splitting policy also imposes a maximum depth, thus avoiding degenerated trees and the need for rebalancing.

[1] The basic PH-tree is a spatial index that maps keys, which are d-dimensional vectors with integers, to user defined values.

Like the Crit bit tree, and unlike most other spatial indexes, the PH-tree is a map rather than a multimap.

quadrants (see below for how potentially large nodes scales with high dimensional data).

For a key-subnode pair, the key represents the center of the subnode.

The PH-tree uses the bits of the multi-dimensional keys to determine their position in the tree.

All keys that have the same leading bits are stored in the same branch of the tree.

For a 3D node with 8 quadrants (forming a cube) the L's bit of the first dimension of the key determines whether the target quadrant is on the left or the right of the cube, the L's bit of the second dimension determines whether it is at the front or the back, and the L's bit of the third dimension determines bottom vs top, see picture.

The bits that are extracted from the keys form the hypercube address

[citation needed] The ordering of the entries in a node always follows Z-ordering.

[1] Another solution is to store entries in a sorted collection, such as dynamic arrays and/or B-trees.

The Lookup operation determines whether a key exists in the tree.

It walks down the tree and checks every node whether it contains a candidate subnode or a user value that matches the key.

The operation traverses the tree like the Lookup function and then inserts the key into the node.

that represent the "lower left" and "upper right" corners of the query box.

[1] In order to accurately estimate query time complexity the analysis needs to include the dimensionality

such that the search can avoid some quadrants that do not overlap with the query box.

` for every dimension where the "lower" half of the node and all quadrants in it does not overlap with the query box.

` for every dimension where the "upper" half does not overlap with the query box.

Depending on the distribution of the occupied quadrants in a node this approach will allow avoiding anywhere from no to almost all key comparisons.

[4] This works best if most of the quadrants that overlap with the query box are occupied with an entry.

k nearest neighbor searches can be implemented using standard algorithms.

Ordering for negative values can be achieved by inverting the non-sign bits.

[citation needed] In order to store volumes (axis-aligned hyper-boxes) as keys, implementations typically use corner representation[7] which converts the two

-dimensional minimum and maximum corners of a box into a single key with

This works trivially for lookup, insert and remove operations.

entries, a PH-tree may have only a single node, effectively “degenerating” into a B-Tree with Z-order curve.