The Garsia–Wachs algorithm is an efficient method for computers to construct optimal binary search trees and alphabetic Huffman codes, in linearithmic time.
leaf nodes, which can be identified (in the order given by the binary tree) with the
internal nodes, that minimizes the weighted sum of the external path lengths.
These path lengths are the numbers of steps from the root to each leaf.
They are multiplied by the weight of the leaf and then summed to give the quality of the overall tree.
The weighted sum of external path lengths controls the expected time for searching the tree.
[1] Alternatively, the output of the problem can be used as a Huffman code, a method for encoding
In this interpretation, the code for a value is given by the sequence of left and right steps from a parent to the child on the path from the root to a leaf in the tree (e.g. with 0 for left and 1 for right).
Unlike standard Huffman codes, the ones constructed in this way are alphabetical, meaning that the sorted order of these binary codes is the same as the input ordering of the values.
If the weight of a value is its frequency in a message to be encoded, then the output of the Garsia–Wachs algorithm is the alphabetical Huffman code that compresses the message to the shortest possible length.
(or any sufficiently large finite value) at the start and end of the sequence.
The initial sequence is just the order in which the leaf weights were given as input.
It then repeatedly performs the following steps, each of which reduces the length of the input sequence, until there is only one tree containing all the leaves:[1] To implement this phase efficiently, the algorithm can maintain its current sequence of values in any self-balancing binary search tree structure.
may be found in logarithmic time by using the balanced tree to perform two binary searches, one for each of these two decreasing sequences.
can be performed in linear total time by using a sequential search that begins at the
But assuming this to be true, the second and third phases of the algorithm are straightforward to implement in linear time.
The Garsia–Wachs algorithm is named after Adriano Garsia and Michelle L. Wachs, who published it in 1977.
[1][3] Their algorithm simplified an earlier method of T. C. Hu and Alan Tucker,[1][4] and (although it is different in internal details) it ends up making the same comparisons in the same order as the Hu–Tucker algorithm.
[5] The original proof of correctness of the Garsia–Wachs algorithm was complicated, and was later simplified by Kingston (1988)[1][2] and Karpinski, Larmore & Rytter (1997).