Scapegoat tree

In computer science, a scapegoat tree is a self-balancing binary search tree, invented by Arne Andersson[2] in 1989 and again by Igal Galperin and Ronald L. Rivest in 1993.

Unlike most other self-balancing binary search trees which also provide worst case

lookup time, scapegoat trees have no additional per-node memory overhead compared to a regular binary search tree: besides key and value, a node stores only two pointers to the child nodes.

This makes scapegoat trees easier to implement and, due to data structure alignment, can reduce node overhead by up to one-third.

An α-weight-balanced node is defined as meeting a relaxed weight balance criterion: Where size can be defined recursively as: Even a degenerate tree (linked list) satisfies this condition if α=1, whereas an α=0.5 would only match almost complete binary trees.

Scapegoat trees are not guaranteed to keep α-weight-balance at all times, but are always loosely α-height-balanced in that Violations of this height balance condition can be detected at insertion time, and imply that a violation of the weight balance condition must exist.

They differ greatly though in their implementations of determining where the rotations (or in the case of scapegoat trees, rebalances) take place.

Whereas red–black trees store additional 'color' information in each node to determine the location, scapegoat trees find a scapegoat which isn't α-weight-balanced to perform the rebalance operation on.

This is loosely similar to AVL trees, in that the actual rotations depend on 'balances' of nodes, but the means of determining the balance differs greatly.

Since AVL trees check the balance value on every insertion/deletion, it is typically stored in each node; scapegoat trees are able to calculate it only as needed, which is only when a scapegoat needs to be found.

A high α value results in fewer balances, making insertion quicker but lookups and deletions slower, and vice versa for a low α.

Therefore in practical applications, an α can be chosen depending on how frequently these actions should be performed.

Lookup is not modified from a standard binary search tree, and has a worst-case time of

The reduced node memory overhead compared to other self-balancing binary search trees can further improve locality of reference and caching.

Insertion is implemented with the same basic ideas as an unbalanced binary search tree, however with a few significant changes.

This is implemented via a simple counter that gets incremented during each iteration of the lookup, effectively counting the number of edges between the root and the inserted node.

If this node violates the α-height-balance property (defined above), a rebalance is required.

To rebalance, an entire subtree rooted at a scapegoat undergoes a balancing operation.

The scapegoat is defined as being an ancestor of the inserted node which isn't α-weight-balanced.

storage space, usually allocated on the stack, or parent pointers.

This can actually be avoided by pointing each child at its parent as you go down, and repairing on the walk back up.

To determine whether a potential node is a viable scapegoat, we need to check its α-weight-balanced property.

To do this we can go back to the definition: However a large optimisation can be made by realising that we already know two of the three sizes, leaving only the third to be calculated.

Assuming that we're climbing back up to the root: But as: The case is trivialized down to: Where x = this node, x + 1 = parent and size(sibling) is the only function call actually required.

time (dependent on the number of nodes of the subtree), insertion has a worst-case performance of

Using aggregate analysis it becomes clear that the amortized cost of an insertion is

This property, which we will call MaxNodeCount simply represents the highest achieved NodeCount.

To perform a deletion, we simply remove the node as you would in a simple binary search tree, but if then we rebalance the entire tree about the root, remembering to set MaxNodeCount to NodeCount.

Using aggregate analysis it becomes clear that the amortized cost of a deletion is

"[3] In the Bible, a scapegoat is an animal that is ritually burdened with the sins of others, and then driven away.