Single instruction, multiple threads

[2][3] ATI Technologies, now AMD, released a competing product slightly later on May 14, 2007, the TeraScale 1-based "R600" GPU chip.

SIMT is intended to limit instruction fetching overhead,[4] i.e. the latency that comes with memory access, and is used in modern GPUs (such as those of Nvidia and AMD) in combination with 'latency hiding' to enable high-performance execution despite considerable latency in memory-access operations.

[5] As with SIMD, another major benefit is the sharing of the control logic by many data lanes, leading to an increase in computational density.

Masking is avoided when control flow is coherent for the threads of a processor, i.e. they all follow the same path of execution.

The masking strategy is what distinguishes SIMT from ordinary SIMD, and has the benefit of inexpensive synchronization between the threads of a processor.