In a few cases, where stricter definitions of binary floating-point arithmetic might be performance-incompatible with some existing implementation, they were made optional.
Participation in drafting the standard was open to people with a solid knowledge of floating-point arithmetic.
More than 90 people attended at least one of the monthly meetings, which were held in Silicon Valley, and many more participated through the mailing list.
The specification levels of a floating-point format have been enumerated, to clarify the distinction between: The sets of representable entities are then explained in detail, showing that they can be treated with the significand being considered either as a fraction or an integer.
This clause has been changed to encourage the use of static attributes for controlling floating-point operations, and (in addition to required rounding attributes) allow for alternate exception handling, widening of intermediate results, value-changing optimizations, and reproducibility.
This section has numerous clarifications (notably in the area of comparisons), and several previously recommended operations (such as copy, negate, abs, and class) are now required.
New operations include fused multiply–add (FMA), explicit conversions, classification predicates (isNan(x), etc.
), various min and max functions, a total ordering predicate, and two decimal-specific operations (samequantum and quantize).
The min and max operations are defined but leave some leeway for the case where the inputs are equal in value but differ in representation.
Decimal arithmetic, compatible with that used in Java, C#, PL/I, COBOL, Python, REXX, etc., is also defined in this section.
Unlike in 854, 754-2008 requires correctly rounded base conversion between decimal and binary floating point within a range which depends on the format.
This clause is new; it recommends fifty operations, including log, power, and trigonometric functions, that language standards should define.
This clause is new; it recommends how language standards should specify the semantics of sequences of operations, and points out the subtleties of literal meanings and optimizations that change the value of a result.
This annex is new; it provides guidance to debugger developers for features that are desired for supporting the debugging of floating-point code.