Expected linear time MST algorithm

The expected linear time MST algorithm is a randomized algorithm for computing the minimum spanning forest of a weighted graph with no isolated vertices.

It was developed by David Karger, Philip Klein, and Robert Tarjan.

The algorithm recursively finds the minimum spanning forest of the first subproblem and uses the solution in conjunction with a linear time verification algorithm to discard edges in the graph that cannot be in the minimum spanning tree.

A procedure taken from Borůvka's algorithm is also used to reduce the size of the graph at each recursion.

Each iteration of the algorithm relies on an adaptation of Borůvka's algorithm referred to as a Borůvka step: A Borůvka step is equivalent to the inner loop of Borůvka's algorithm, which runs in O(m) time where m is the number of edges in G. Furthermore, since each edge can be selected at most twice (once by each incident vertex) the maximum number of disconnected components after step 1 is equal to half the number of vertices.

Thus, a Borůvka step reduces the number of vertices in the graph by at least a factor of two and deletes at least n/2 edges where n is the number of vertices in G. Example execution of a Borůvka step In each iteration the algorithm removes edges with particular properties that exclude them from the minimum spanning tree.

If F is a subgraph of G then any F-heavy edge in G cannot be in the minimum spanning tree of G by the cycle property.

Given a forest, F-heavy edges can be computed in linear time using a minimum spanning tree verification algorithm.

[2][3] Input: A graph G with no isolated vertices Output: The minimum spanning forest of G' and the contracted edges from the Borůvka steps Correctness is proved by induction on the number of vertices in the graph.

Every F-heavy edge deleted is not in the minimum spanning tree by the cycle property.

Finally F' is the minimum spanning tree of the contracted graph by the inductive hypothesis.

The effectiveness of the random sampling step is described by the following lemma which places a bound on the number of F-light edges in G thereby restricting the size of the second subproblem.

This also means e is F-light regardless of whether or not it is added to H since only heavier edges are subsequently considered.

[2][3] Since the work done in one iteration of the algorithm is linear in the number of edges the work done in one complete run of the algorithm (including all recursive calls) is bounded by a constant factor times the total number of edges in the original problem and all recursive subproblems.

The left paths of a binary tree are shown circled in blue in the diagram on the right.

Thus if the expected number of edges in a problem at the top of a left path is k then the sum of the expected number of edges in each subproblem in the left path is at most

The expected number of edges in each right subproblem is equal to the number of F-light edges in the parent problem where F is the minimum spanning tree of the left subproblem.

Since n at most 2m for a graph with no isolated vertices the algorithm runs in expected time O(m).

Left paths of a binary tree are circled in blue