It competed directly with AMD's first unified shader microarchitecture named TeraScale, a development of ATI's work on the Xbox 360 which used a similar design.
The design is a major shift for NVIDIA in GPU functionality and capability, the most obvious change being the move from the separate functional units (pixel shaders, vertex shaders) within previous GPUs to a homogeneous collection of universal floating point processors (called "stream processors") that can perform a more universal set of tasks.
Unlike the vector processing approach taken with older shader units, each SP is scalar and thus can operate only on one component at a time.
The lower maximum throughput of these scalar processors is compensated for by efficiency and by running them at a high clock speed (made possible by their simplicity).
[2] The claimed theoretical single-precision processing power for Tesla-based cards given in FLOPS may be hard to reach in real-world workloads.
Each SP can fulfill up to two single-precision operations per clock: 1 Multiply and 1 Add, using a single MAD instruction.