Multiply–accumulate operation

Modern computers may contain a dedicated MAC, consisting of a multiplier implemented in combinational logic followed by an adder and an accumulator register that stores the result.

A fast FMA can speed up and improve the accuracy of many computations that involve the accumulation of products: Fused multiply–add can usually be relied on to give more accurate results.

[8] If x2 − y2 is evaluated as ((x × x) − y × y) (following Kahan's suggested notation in which redundant parentheses direct the compiler to round the (x × x) term first) using fused multiply–add, then the result may be negative even when x = y due to the first multiplication discarding low significance bits.

However, standard industrial implementations based on the original IBM RS/6000 design require a 2N-bit adder to compute the sum properly.

The GCC and Clang C compilers do such transformations by default for processor architectures that support FMA instructions.

With GCC, which does not support the aforementioned pragma,[11] this can be globally controlled by the -ffp-contract command line option.