Iterative Stencil Loops

Other notable examples include solving partial differential equations,[1] the Jacobi kernel, the Gauss–Seidel method,[2] image processing[1] and cellular automata.

Most finite difference codes which operate on regular grids can be formulated as ISLs.

ISLs perform a sequence of sweeps (called timesteps) through a given array.

[2] Using neighboring array elements in a fixed pattern (the stencil), each cell's new value is computed.

with the following meaning:[3] Since I is a k-dimensional integer interval, the array will always have the topology of a finite regular grid.

The array is also called simulation space and individual cells are identified by their index

may be defined by a vector addition modulo the simulation space's dimension to realize toroidal topologies: This may be useful for implementing periodic boundary conditions, which simplifies certain physical models.

The update function computes the arithmetic mean of a cell's four neighbors.

The example above uses a 2D von Neumann stencil while LBM codes generally use its 3D variant.

Since computing time and memory consumption grow linearly with the number of array elements, parallel implementations of ISLs are of paramount importance to research.

[6] This is challenging since the computations are tightly coupled (because of the cell updates depending on neighboring cells) and most ISLs are memory bound (i.e. the ratio of memory accesses to calculations is high).

The libraries are mostly concerned with the parallelization, but may also tackle other challenges, such as IO, steering and checkpointing.

The library manages a set of n-dimensional scalar arrays, which the user program may access to perform updates.

The library handles the synchronization of the boundaries (dubbed ghost zone or halo).

The advantage of this interface is that the user program may loop over the arrays, which makes it easy to integrate legacy code [10] .

The disadvantage is that the library can not handle cache blocking (as this has to be done within the loops[11]) or wrapping of the API-calls for accelerators (e.g. via CUDA or OpenCL).

Implementations include Cactus, a physics problem solving environment, and waLBerla.

The advantage of this approach is that the library can control tightly which cells are updated in which order, which is useful not only to implement cache blocking,[9] but also to run the same code on multi-cores and GPUs.

This is only feasible with techniques such as class templates or metaprogramming, which is also the reason why this design is only found in newer libraries.

The shape of a 7-point 3D von Neumann style stencil.
Data dependencies of a selected cell in the 2D array.