Fractal tree index

In computer science, a fractal tree index is a tree data structure that keeps data sorted and allows searches and sequential access in the same time as a B-tree but with insertions and deletions that are asymptotically faster than a B-tree.

Furthermore, unlike a B-tree, a fractal tree index has buffers at each node, which allow insertions, deletions and other changes to be stored in intermediate locations.

Like a B-tree, fractal tree indexes are optimized for systems that read and write large blocks of data.

In order to maintain the pre-defined range, internal nodes may be joined or split.

Fractal tree indexes and B-trees both exploit the fact that when a node is fetched from storage, a block of memory, whose size is denoted by

The remaining space in each node is used to buffer insertions, deletion and updates, which we refer to in aggregate as messages.

, thereby providing a smooth tradeoff between search cost, which depends on the depth of the search tree, and therefore the branching factor, versus the insertion time, which depends on the depth of the tree but more sensitively on the size of the buffer flushes.

The theoretical literature on this topic is very large, so this discussion is limited to a comparison with popular data structures that are in use in databases and file systems.

The search time of a B-tree is asymptotically the same as that of a fractal tree index.

However, for many workloads most or all internal nodes of both B-trees and fractal tree indexes are already cached in RAM.

Thus, for many workloads, fractal tree indexes can match B-trees in terms of search time.

can be quite large, this yields a potential two-order-of-magnitude improvement in worst-case insertion times, which is observed in practice.

Both B-trees and fractal tree indexes can perform insertions faster in the best case.

For example, if keys are inserted in sequential order, both data structures achieve a

Thus, because the best and worst cases of B-trees differ so widely, whereas fractal tree indexes are always near their best case, the actual speedup that fractal tree indexes achieve over B-trees depends on the details of the workload.

The IO-complexity of an LSM depends on parameters such as the growth factor between levels and the data structure chosen at each level, so in order to analyze the complexity of LSMs, we need to pick a specific version.

For comparison purposes, we select the version of LSMs that match fractal tree indexes on insertion performance.

In fact, although B-trees and fractal tree indexes are both on the optimal tradeoff curve between insertions and queries, LSMs are not.

Also, the growth factor between levels can be set to some other value, giving a range of insertion/query tradeoffs.

However, for every choice of insertion rate, the corresponding fractal tree index has faster queries.

Like a Bε tree, it consists of nodes with keys and buffers and realizes the optimal insertion/query tradeoff.

The fractal tree index differs in including performance optimization and in extending the functionality.

Such a scheme works well in a B-tree because both insertions and queries involve fetching the same leaf into memory.

However, in fractal tree indexes, insertions are messages, and a row may reside in more than one node at the same time.

Fractal tree indexes therefore require a separate locking structure that is IO-efficient or resides in memory in order to implement the locking involved in implementing ACID semantics.

The solution implemented in fractal tree indexes is to have large leaves that can be fetched as a whole for fast range queries but are broken into smaller pieces call basement nodes which can be fetched individually.

However, broadcast messages, which are copied to all outgoing buffers, can modify all rows in a database.

Although the total work required to change all rows is unchanged over the brute-force method of traversing the table, the latency is improved, since, once the message is injected into the root buffer, all subsequent queries will be able to apply the schema modification to any rows they encounter.

The schema change is immediate and the work is deferred to such a time when buffers overflow and leaves would have gotten updated anyway.

It is available as TokuDB as a storage engine for MySQL and MariaDB, and as TokuMX, a more complete integration with MongoDB.