The gather variant is as follows: where the scatter variant is: Note that unlike in memory-based gather-scatter all three of dest, src, and indices are registers (or parts of registers in the case of bit-level permute), not memory locations.
A special case of permute is also used in GPU "swizzling" (again, not strictly a permutation) which performs on-the-fly reordering of subvector data so as to align or duplicate elements with the appropriate SIMD lane.
Permute instructions occur in both scalar processors as well as vector processing engines as well as GPUs.
[4] Also in some non-vector ISAs, due to there sometimes being insufficient space in the one source input register to specify the permutation source array in full (particularly if the operation involves bit-level permutation), will include partial reordering instructions.
Permute operations in different forms are surprisingly common, occurring in AltiVec, Power ISA, PowerPC G4, AVX-512, SVE2,[5] vector processors, and GPUs.