Bit-reversal permutation

This permutation can be applied to any sequence in linear time while performing only simple index calculations.

It has applications in the generation of low-discrepancy sequences and in the evaluation of fast Fourier transforms.

In such cases, the digit-reversal permutation should simultaneously reverse the digits of each item and the bases of the number system, so that each reversed digit remains within the range defined by its base.

[3] Permutations that generalize the bit-reversal permutation by reversing contiguous blocks of bits within the binary representations of their indices can be used to interleave two equal-length sequences of data in-place.

[4] There are two extensions of the bit-reversal permutation to sequences of arbitrary length.

These extensions coincide with bit-reversal for sequences whose length is a power of 2, and their purpose is to separate adjacent items in a sequence for the efficient operation of the Kaczmarz algorithm.

[6] Bit reversal is most important for radix-2 Cooley–Tukey FFT algorithms, where the recursive stages of the algorithm, operating in-place, imply a bit reversal of the inputs or outputs.

[7] The bit reversal permutation has also been used to devise lower bounds in distributed computation.

[8] The Van der Corput sequence, a low-discrepancy sequence of numbers in the unit interval, is formed by reinterpreting the indexes of the bit-reversal permutation as the fixed-point binary representations of dyadic rational numbers.

Bit-reversal permutations are often used in finding lower bounds on dynamic data structures.

[2] Because the bit-reversal permutation is an involution, it may be performed easily in place (without copying the data into another array) by swapping pairs of elements.

In the random-access machine commonly used in algorithm analysis, a simple algorithm that scans the indexes in input order and swaps whenever the scan encounters an index whose reversal is a larger number would perform a linear number of data moves.

Alternative algorithms can perform a bit reversal permutation in linear time while using only simple index calculations.

[11] Because bit-reversal permutations may be repeated multiple times as part of a calculation, it may be helpful to separate out the steps of the algorithm that calculate index data used to represent the permutation (for instance, by using the doubling and concatenation method) from the steps that use the results of this calculation to permute the data (for instance, by scanning the data indexes in order and performing a swap whenever the swapped location is greater than the current index, or by using more sophisticated vector scatter–gather operations).

[2] Another consideration that is even more important for the performance of these algorithms is the effect of the memory hierarchy on running time.

Because of this effect, more sophisticated algorithms that consider the block structure of memory can be faster than this naive scan.

[2][10] An alternative to these techniques is special computer hardware that allows memory to be accessed both in normal and in bit-reversed order.

[12] The performance improvement of bit-reversals in both uniprocessor and multiprocessors has been paid a serious attention in high-performance computing fields.

Because architecture-aware algorithm development can best utilize hardware and system software resources, including caches, TLB, and multicores, significantly accelerating the computation.

A Hammersley set whose coordinates are the integers from 0 to 255 and their bit-reversals