Hypercube (communication pattern)

-dimensional hypercube is a network topology for parallel computers with

The topology allows for an efficient implementation of some basic communication primitives such as Broadcast, All-Reduce, and Prefix sum.

The algorithms described in this page utilize this structure efficiently.

Most of the communication primitives presented in this article share a common template.

The following pseudo code sketches the communication steps necessary.

Hereby, Initialization, Operation, and Output are placeholders that depend on the given communication primitive (see next section).

Each processing element iterates over its neighbors (the expression

's binary representation, therefore obtaining the numbers of its neighbors).

The processing operation depends on the communication primitive.

In the beginning of a prefix sum operation, each processing element

The pseudo code stores the prefix sum in variable

All-gather operations start with each processing element having a message

With each iteration, the transferred message doubles in length.

So it is a Reduce operation, where all processing units know the result.

Compared to a normal reduce operation followed by a broadcast, All-Reduce in hypercubes reduces the number of communication steps.

With each iteration a messages comes closer to its destination by one dimension, if it hasn't arrived yet.

In every following step, the sub cube is only half the size as before, but in the previous step exactly the same number of messages arrived from another processing element.

The ESBT-broadcast (Edge-disjoint Spanning Binomial Tree) algorithm[3] is a pipelined broadcast algorithm with optimal runtime for clusters with hypercube network topology.

edge-disjoint binomial trees in the hypercube, such that each neighbor of processing element

chunks of equal size and cyclically sends them to the roots of the binomial trees.

Upon receiving a chunk, the binomial trees broadcast it.

Broadcasting the chunk within the binomial tree takes

steps until the last binomial tree broadcast has finished, resulting in

This section describes how to construct the binomial trees systematically.

First, construct a single binomial spanning tree von

Then the children of each nodes are obtained by negating single leading zeroes.

This results in a single binomial spanning tree.

edge-disjoint copies of the tree, translate and rotate the nodes: for the

-th copy of the tree, apply a XOR operation with

The resulting binomial trees are edge-disjoint and therefore fulfill the requirements for the ESBT-broadcasting algorithm.

Algorithm outline applied to the -dimensional hypercube. In the first step (before any communication), each processing element possesses one message (blue). Communication is marked red. After each step, the processing elements store the received message, but other operations are also possible.
Example for a prefix sum calculation. Upper number: tentatetive prefix sum (variable ). Lower number: sum over all elements in the sub cube (variable ).
A -dimensional hypercubes with three ESBT embedded.