Hilbert R-tree

The performance of R-trees depends on the quality of the algorithm that clusters the data rectangles on a node.

In both cases Hilbert space-filling curves are used to achieve better ordering of multidimensional objects in the node.

Moreover, dynamic Hilbert R-trees employ flexible deferred splitting mechanism to increase the space utilization.

By adjusting the split policy, the Hilbert R-tree can achieve a degree of space utilization as high as desired.

Although the following example is for a static environment, it explains the intuitive principles for good R-tree design.

Roussopoulos and Leifker proposed a method for building a packed R-tree that achieves almost 100% space utilization.

The fact that the resulting father nodes cover little area explains why the lowx packed R-tree achieves excellent performance for point queries.

However, the fact that the fathers have large perimeters explains the degradation of performance for region queries.

[1] Intuitively, the packing algorithm should ideally assign nearby points to the same leaf node.

Ignorance of the y coordinate by the lowx packed R-tree tends to violate this empirical rule.

Thus, the space utilization is ≈100%; this structure is called a packed Hilbert R-tree.

The basic Hilbert curve on a 2x2 grid, denoted by H1 is shown in Figure 2.

Figure 2: Hilbert curves of order 1, 2, and 3 The Hilbert curve imposes a linear ordering on the data rectangles and then traverses the sorted list, assigning each set of C rectangles to a node in the R-tree.

Figure 2 illustrates the intuitive reasons why our Hilbert-based methods will result in good performance.

By grouping the points according to their Hilbert values, the MBRs of the resulting R-tree nodes tend to be small square-like rectangles.

Sort data rectangles on ascending Hilbert values Step 3.

/* Create nodes at higher level (l + 1) */ The assumption here is that the data are static or the frequency of modification is low.

The performance of R-trees depends on the quality of the algorithm that clusters the data rectangles on a node.

A leaf node contains at most Cl entries each of the form (R, obj _id) where Cl is the capacity of the leaf, R is the MBR of the real object (xlow, xhigh, ylow, yhigh) and obj-id is a pointer to the object description record.

The Hilbert values of the centers are the numbers near the "x" symbols (shown only for the parent node "II").

Next, the algorithms for searching, insertion, and overflow handling are described in detail.

Starting from the root, it descends the tree and examines all nodes that intersect the query rectangle.

At the leaf level, it reports all entries that intersect the query window w as qualified data items.

At each level the node with the minimum LHV value greater than h of all its siblings is chosen.

Grow tree taller: Algorithm ChooseLeaf(rect r, int h): /* Returns the leaf node in which to place a new rectangle r. */ C1.