Mixed-precision arithmetic

A common usage of mixed-precision arithmetic is for operating on inaccurate numbers with a small width and expanding them to a larger, more accurate representation.

Supercomputers such as Summit utilize mixed-precision arithmetic to be more efficient with regards to memory and processing time, as well as power consumption.

[1][2][3] A floating-point number is typically packed into a single bit-string, as the sign bit, the exponent field, and the significand or mantissa, from left to right.

Some platforms, including Nvidia, Intel, and AMD CPUs and GPUs, provide mixed-precision arithmetic for this purpose, using coarse floats when possible, but expanding them to higher precision when necessary.

Whenever the gradients contain a NaN (indicating overflow), the weight update is skipped, and the scale factor is decreased.