External memory graph traversal

External memory graph traversal is a type of graph traversal optimized for accessing externally stored memory.

Graph traversal algorithms, like breadth-first search and depth-first search, are analyzed using the von Neumann model, which assumes uniform memory access cost.

This view neglects the fact, that for huge instances part of the graph resides on disk rather than internal memory.

Since accessing the disk is magnitudes slower than accessing internal memory, the need for efficient traversal of external memory exists.

For external memory algorithms the external memory model by Aggarwal and Vitter[1] is used for analysis.

A machine is specified by three parameters: M, B and D. M is the size of the internal memory, B is the block size of a disk and D is the number of parallel disks.

The breadth-first search algorithm starts at a root node and traverses every node with depth one.

Eventually, every node of the graph has been visited.

, Munagala and Ranade[2] proposed the following external memory algorithm: Let

denote the nodes in breadth-first search level t and let

by transforming it into a set and excluding previously visited nodes from it.

A visualization of the three described steps necessary to compute L(t) is depicted in the figure on the right.

Mehlhorn and Meyer[3] proposed an algorithm that is based on the algorithm of Munagala and Ranade (MR) and improves their result.

During the preprocessing phase the graph is partitioned into disjointed subgraphs

It further partitions the adjacency lists accordingly, by constructing an external file

The breadth-first search phase is similar to the MR algorithm.

Further, the nodes of any created breadth-first search level carry identifiers for the files

Edges might be scanned more often in H, but unstructured I/Os in order to fetch adjacency lists are reduced.

The depth-first search algorithm explores a graph along each branch as deep as possible, before backtracing.

For directed graphs Buchsbaum, Goldwasser, Venkatasubramanian and Westbrook[4] proposed an algorithm with

This algorithm is based on a data structure called buffered repository tree (BRT).

It stores a multi-set of items from an ordered universe.

A BTR offers two operations: The algorithm simulates an internal depth-first search algorithm.

Further, outgoing edges (v,x) are put into a priority queue P(v), keyed by the rank in the adjacency list.

For vertex u on top of S all edges (u,x) are extracted from D. Such edges only exist if x has been discovered since the last time u was on top of S (or since the start of the algorithm if u is the first time on top of S).

For every edge (u,x) a delete(x) operation is performed on P(u).

If P(u) is empty, u is popped from S. Pseudocode for this algorithm is given below.