This is often less intuitive, but can achieve the memory throughput of the SoA approach, while being more friendly to the cache locality and load port architectures of modern processors.
Yet another option used in some Cell libraries is to de-interleave data from the AoS format when loading sources into registers, and interleave when writing out results (facilitated by the superscalar issue of permutes).
SIMD ISAs are usually designed for homogeneous data, however some provide a dot product instruction[5] and additional permutes, making the AoS case easier to handle.
Although most GPU hardware has moved away from 4D instructions to scalar SIMT pipelines,[6] modern compute kernels using SoA instead of AoS can still give better performance due to memory coalescing.
An example of AoSoA in metaprogramming is found in LANL's Cabana library written in C++; it assumes a vector width of 16 lanes by default.