SPIKE algorithm

The SPIKE algorithm is a hybrid parallel solver for banded linear systems developed by Eric Polizzi and Ahmed Sameh[1]^ [2] The SPIKE algorithm deals with a linear system AX = F, where A is a banded

The first step of the preprocessing stage is to factorize the diagonal blocks Aj.

For numerical stability, one can use LAPACK's XGBTRF routines to LU factorize them with partial pivoting.

Alternatively, one can also factorize them without partial pivoting but with a "diagonal boosting" strategy.

After the diagonal blocks are factorized, the spikes are computed and passed on to the postprocessing stage.

Notice that D̃1 essentially consists of diagonal blocks of order 4m extracted from S̃.

Now we factorize S̃ as The new matrix S̃2 has the form Its structure is very similar to that of S̃2, only differing in the number of spikes and their height (their width stays the same at m).

With them omitted, the reduced system becomes block diagonal and can be easily solved in parallel [3].

The first SPIKE partitioning and algorithm was presented in [4] and was designed as the means to improve the stability properties of a parallel Givens rotations-based solver for tridiagonal systems.

A version of the algorithm, termed g-Spike, that is based on serial Givens rotations applied independently on each block was designed for the NVIDIA GPU [5].

A SPIKE-based algorithm for the GPU that is based on a special block diagonal pivoting strategy is described in [6].

The SPIKE algorithm can also function as a preconditioner for iterative methods for solving linear systems.

To solve a linear system Ax = b using a SPIKE-preconditioned iterative solver, one extracts center bands from A to form a banded preconditioner M and solves linear systems involving M in each iteration with the SPIKE algorithm.

The SPIKE algorithm can be generalized by not restricting the preconditioner to be strictly banded.

This enhances the preconditioner, and hence allows better chance of convergence and reduces the number of iterations.

Tridiagonal solvers have also been developed for the NVIDIA GPU [8][9] and the Xeon Phi co-processors.

[1] The Givens rotations based solver was also implemented for the GPU and the Intel Xeon Phi.