Block sort

It gets its name from the observation that merging two sorted lists, A and B, is equivalent to breaking A into evenly sized blocks, inserting each A block into B under special rules, and merging AB pairs.

One practical algorithm for O(n log n) in-place merging was proposed by Pok-Son Kim and Arne Kutzner in 2008.

The remaining A blocks are then inserted into B and merged using one of the two buffers as swap space.

The values in the buffers are then redistributed to their first sorted position within the array.

However, it benefits from the variant that ensures each A and B subarray are the same size to within one item: Fixed-point math may also be used, by representing the scale factor as a fraction integer_part + numerator/denominator: The two internal buffers needed for each level of the merge step are created by moving the first 2√A first instances of their values within an A subarray to the start of A.

In this case it moves the last instance of each value to the end of B, with that part of B not being included during the merges.

[7] When the minimum A block is dropped behind and needs to be rotated into the previous B block, after which its contents are swapped into the second internal buffer for the local merges, it would be faster to swap the A block to the buffer beforehand, and to take advantage of the fact that the contents of that buffer do not need to retain any order.

The floating hole in this case refers to the contents of the second internal buffer floating around the array, and acting as a hole in the sense that the items do not need to retain their order.

If the second buffer does not exist, a strictly in-place merge operation must be performed, such as a rotation-based version of the Hwang and Lin algorithm,[7][8] the Dudzinski and Dydek algorithm,[9] or a repeated binary search and rotate.

This is handled by running a linear search through those A blocks and comparing the tag values to find the smallest one.

These remaining A blocks then continue rolling through the array and being dropped and inserted where they belong.

However, it must then repeat the process for the remaining A and B subarrays for the current level of the merge sort.

Note that the internal buffers can be reused for every set of A and B subarrays for this level of the merge sort, and do not need to be re-extracted or modified in any way.

The two buffers must then be redistributed back into the array using the opposite process that was used to create them.

The second value of each A block doesn't necessarily need to be tagged – the first, last, or any other element could be used instead.

Insertion sort is still recommended, though, for its situational performance and lack of recursion.

Insertion sort is an O(n2) operation, so this leads to anywhere from O(162 × n/16) to O(312 × n/31), which is O(n) once the constant factors are omitted.

It must also apply an insertion sort on the second internal buffer after each level of merging is completed.

In the worst case this will end up searching the entire array before finding

unique values to create the internal buffers, a normally suboptimal in-place merge operation is performed where it repeatedly binary searches and rotates A into B.

However, the known lack of unique values within any of the subarrays places a hard limit on the number of binary searches and rotations that will be performed during this step, which is again

After omitting all but the highest complexity and considering that there are log(n) levels in the outer merge loop, this leads to a final asymptotic complexity of O(n log(n)) for the worst and average cases.

For the best case, where the data is already in order, the merge step performs n/16 comparisons for the first level, then n/32, n/64, n/128, etc.

As block sort is non-recursive and does not require the use of dynamic allocations, this leads to constant stack and heap space.

It uses O(1) auxiliary memory in a transdichotomous model, which accepts that the O(log n) bits needed to keep track of the ranges for A and B cannot be any greater than 32 or 64 on 32-bit or 64-bit computing systems, respectively, and therefore simplifies to O(1) space for any array that can feasibly be allocated.

Although items in the array are moved out of order during a block sort, each operation is fully reversible and will have restored the original order of equivalent items by its completion.

The more ordered the data originally was, the fewer B values there will be that need to be merged into A.

[16] It only checks for these sorted ranges at the two predefined levels: the A and B subarrays, and the A and B blocks.

The buffer extraction process for block sort.

Tagging the A blocks using values from the first internal buffer. Note that the first A block and last B block are unevenly sized.

Two A blocks rolling through the B blocks. Once the first A block is dropped behind, the unevenly sized A block is locally merged with the B values that follow it.