IEEE 754-1985

Floating-point numbers in IEEE 754 format consist of three fields: a sign bit, a biased exponent, and a fraction.

Using a biased exponent, the lesser of two positive floating-point numbers will come out "less than" the greater following the same ordering as for sign and magnitude integers.

To reduce the loss of precision when an underflow occurs, IEEE 754 includes the ability to represent fractions smaller than are possible in the normalized representation, by making the implicit leading digit a 0.

Here are some examples of single-precision IEEE 754 representations: Every possible bit combination is either a NaN or a number with a unique value in the affinely extended real number system with its associated order, except for the two combinations of bits for negative zero and positive zero, which sometimes require special attention (see below).

The binary representation has the special property that, excluding NaNs, any two numbers can be compared as sign and magnitude integers (endianness issues apply).

Rounding errors inherent to floating point calculations may limit the use of comparisons for checking the exact equality of results.

[9] Although negative zero and positive zero are generally considered equal for comparison purposes, some programming language relational operators and similar constructs treat them as distinct.

During drafting, there was a proposal for the standard to incorporate the projectively extended real number system, with a single unsigned infinity, by providing programmers with a mode selection option.

[15][16] Intel hoped to be able to sell a chip containing good implementations of all the operations found in the widely varying maths software libraries.

[15][17] John Palmer, who managed the project, believed the effort should be backed by a standard unifying floating point operations across disparate processors.

Kahan suggested that Intel use the floating point of Digital Equipment Corporation's (DEC) VAX.

The work within Intel worried other vendors, who set up a standardization effort to ensure a "level playing field".

The draft was co-written with Jerome Coonen and Harold Stone, and was initially known as the "Kahan-Coonen-Stone proposal" or "K-C-S format".

[16][19][21] Kahan's proposal also provided for infinities, which are useful when dealing with division-by-zero conditions; not-a-number values, which are useful when dealing with invalid operations; denormal numbers, which help mitigate problems caused by underflow;[19][22][23] and a better balanced exponent bias, which can help avoid overflow and underflow when taking the reciprocal of a number.

The number 0.15625 represented as a single-precision IEEE 754-1985 floating-point number. See text for explanation.
The three fields in a 64bit IEEE 754 float
Relative precision of single (binary32) and double precision (binary64) numbers, compared with decimal representations using a fixed number of significant digits . Relative precision is defined here as ulp( x )/ x , where ulp( x ) is the unit in the last place in the representation of x , i.e. the gap between x and the next representable number.
Intel 8087 floating-point coprocessor