SSE2

These registers can load up to 128 bits of data and perform instructions, such as vector addition and multiplication, simultaneously.

SSE2 intends to fully replace MMX, a SIMD instruction set found on IA-32 architecture processors.

Competing chip-maker AMD added support for SSE2 with the introduction of their Opteron and Athlon 64 ranges of AMD64 64-bit CPUs in 2003.

The SSE2 also complements the floating-point vector operations of the SSE instruction set by adding support for the double precision data type.

Other SSE2 extensions include a set of cache control instructions intended primarily to minimize cache pollution when processing infinite streams of information, and a sophisticated complement of numeric format conversion instructions.

AMD's implementation of SSE2 on the AMD64 (x86-64) platform includes an additional eight registers, doubling the total number to 16 (XMM0 through XMM15).

Intel addressed the first problem by adding an instruction in SSE3 to reduce the overhead of accessing unaligned data and improving the overall performance of misaligned loads, and the last problem by widening the execution engine in their Core microarchitecture in Core 2 Duo and later products.

Since the problem is not locally apparent in the MMX code, finding and correcting the bug can be very time consuming.

For example, to use SSE2 in a Microsoft Visual Studio project, the programmer had to either manually write inline-assembly or import object-code from an external source.