Modern computers may contain a dedicated MAC, consisting of a multiplier implemented in combinational logic followed by an adder and an accumulator register that stores the result.
A fast FMA can speed up and improve the accuracy of many computations that involve the accumulation of products: Fused multiply–add can usually be relied on to give more accurate results.
[8] If x2 − y2 is evaluated as ((x × x) − y × y) (following Kahan's suggested notation in which redundant parentheses direct the compiler to round the (x × x) term first) using fused multiply–add, then the result may be negative even when x = y due to the first multiplication discarding low significance bits.
However, standard industrial implementations based on the original IBM RS/6000 design require a 2N-bit adder to compute the sum properly.
The Digital Equipment Corporation (DEC) VAX's POLY instruction is used for evaluating polynomials with Horner's rule using a succession of multiply and add steps.
Instruction descriptions do not specify whether the multiply and add are performed using a single FMA step.
The GCC and Clang C compilers do such transformations by default for processor architectures that support FMA instructions.