R-tree

R-trees are tree data structures used for spatial access methods, i.e., for indexing multi-dimensional information such as geographical coordinates, rectangles or polygons.

The R-tree was proposed by Antonin Guttman in 1984[2] and has found significant use in both theoretical and applied contexts.

[3] A common real-world usage for an R-tree might be to store spatial objects such as restaurant locations or the polygons that typical maps are made of: streets, buildings, outlines of lakes, coastlines, etc.

Similar to the B-tree, the R-tree is also a balanced search tree (so all leaf nodes are at the same depth), organizes the data in pages, and is designed for storage on disk (as used in databases).

The key idea is to use the bounding boxes to decide whether or not to search inside a subtree.

However, for in-memory applications, there are similar alternatives that can provide slightly better performance or be simpler to implement in practice.

[6] This approach is scalable for increasingly large applications and achieves high throughput and low latency performance for R-tree.

The key difficulty of R-tree is to build an efficient tree that on one hand is balanced (so the leaf nodes are at the same height) on the other hand the rectangles do not cover too much empty space and do not overlap too much (so that during search, fewer subtrees need to be processed).

Once that page is full, the data is split into two sets that should cover the minimal area each.

Data in R-trees is organized in pages that can have a variable number of entries (up to some pre-defined maximum, and usually above a minimum fill).

When a data object is fully contained in a single rectangle, the choice is clear.

When there are multiple options or rectangles in need of enlargement, the choice can have a significant impact on the performance of the tree.

In the classic R-tree, Guttman proposed two such heuristics, called QuadraticSplit and LinearSplit.

In quadratic split, the algorithm searches for the pair of rectangles that is the worst combination to have in the same node, and puts them as initial objects into the two new groups.

If during this process the root node has a single element, the tree height can decrease.