SSE4

It was announced on September 27, 2006, at the Fall 2006 Intel Developer Forum, with vague details in a white paper;[1] more precise details of 47 instructions became available at the Spring 2007 Intel Developer Forum in Beijing, in the presentation.

[3] Like other previous generation CPU SIMD instruction sets, SSE4 supports up to 16 registers, each 128-bits wide which can load four 32-bit integers, four 32-bit single precision floating point numbers, or two 64-bit double precision floating point numbers.

It also allowed disabling the alignment check on non-load SSE operations accessing memory.

Internally dubbed Merom New Instructions, Intel originally did not plan to assign a special name to them, which was criticized by some journalists.

[6] Intel eventually cleared up the confusion and reserved the SSE4 name for their next instruction set extension.

[8] Unlike all previous iterations of SSE, SSE4 contains instructions that execute operations which are not specific to multimedia applications.

[11] It also added a CRC32 instruction to compute cyclic redundancy checks as used in certain data transfer protocols.

AMD calls this pair of instructions Advanced Bit Manipulation (ABM).

This results in an issue where LZCNT called on some CPUs not supporting it, such as Intel CPUs prior to Haswell, may incorrectly execute the BSR operation instead of raising an invalid instruction exception.

Trailing zeros can be counted using the BSF (bit scan forward) or TZCNT instructions.