B-tree

The B-tree generalizes the binary search tree, allowing for nodes with more than two children.

B-trees were invented by Rudolf Bayer and Edward M. McCreight while working at Boeing Research Labs to efficiently manage index pages for large random-access files.

The basic assumption was that indices would be so voluminous that only small chunks of the tree could fit in main memory.

Bayer and McCreight's paper Organization and maintenance of large ordered indices[1] was first circulated in July 1970 and later published in Acta Informatica.

We had to give the thing a name.... We were working for Boeing at the time, but we couldn't use the name without talking to the lawyers.

[8] Bayer and McCreight (1972),[3] Comer (1979),[2] and others define the order of B-tree as the minimum number of keys in a non-root node.

Folk and Zoellick [9] points out that terminology is ambiguous because the maximum number of keys is not clear.

[11] As with other trees, B-trees can be represented as a collection of three types of nodes: root, internal (a.k.a.

Similarly, a B-tree is kept balanced after deletion by merging or redistributing keys among siblings to maintain the

Sorting and searching algorithms can be characterized by the number of comparison operations that must be performed using order notation.

The search time is reduced because individual records are grouped together in a disk block.

To speed up the search further, the time to do the first 13 to 14 comparisons (which each required a disk access) must be reduced.

A B-tree index creates a multi-level tree structure that breaks a database down into fixed-size blocks or pages.

Because each node (or internal page) can have more than two children, a B-tree index will usually have a shorter height (the distance from the root to the farthest leaf) than a Binary Search Tree.

Like the main database, the last six or so comparisons in the auxiliary index would be on the same disk block.

Binary search is typically (but not necessarily) used within nodes to find the separation values and child tree of interest.

Algorithmically described below: Rebalancing starts from a leaf and proceeds toward the root until the tree is balanced.

The algorithm to rebalance the tree is as follows: While freshly loaded databases tend to have good sequential behaviour, this behaviour becomes increasingly difficult to maintain as a database grows, resulting in more random I/O and performance challenges.

[22] A common special case is adding a large amount of pre-sorted data into an initially empty B-tree.

Instead, a special "bulk loading" algorithm can be used to produce a more efficient tree with a higher branching factor.

In addition to its use in databases, the B-tree (or § Variants) is also used in filesystems to allow quick random access to an arbitrary block in a particular file.

, the operating system (or disk utility) must sequentially follow the file's linked list in the FAT.

TOPS-20 (and possibly TENEX) used a 0 to 2 level tree that has similarities to a B-tree[citation needed].

Compared to a skip list, both structures have the same performance, but the B-tree scales better for growing n. A T-tree, for main memory database systems, is similar but more compact.

Lehman and Yao[25] showed that all the read locks could be avoided (and thus concurrent access greatly improved) by linking the tree blocks at each level together with a "next" pointer.

This results in a tree structure where both insertion and search operations descend from the root to the leaf.

This maximizes access concurrency by multiple users, an important consideration for databases and/or other B-tree-based ISAM storage methods.

The cost associated with this improvement is that empty pages cannot be removed from the btree during normal operations.

A Maple tree is a B-tree developed for use in the Linux kernel to reduce lock contention in virtual memory management.

In contrast, an (a,b)-tree allows the minimum number of children for an internal node to be set arbitrarily low.

A B-tree ( Bayer & McCreight 1972 ) of order 5 ( Knuth 1998 ).

A B-tree insertion example with each iteration. The nodes of this B-tree have at most 3 children (Knuth order 3).