Block floating point

BFP can be advantageous to limit space use in hardware to perform the same functions as floating-point algorithms, by reusing the exponent; some operations over multiple values between blocks can also be done with a reduced amount of computation.

For this to be done, the number of left shifts needed for the data must be normalized to the dynamic range of the processor used.

[7][8][9] The MX format uses a single shared scaling factor (exponent) for a block of elements, significantly reducing the memory footprint and computational resources required for AI operations.

[10] For instance, MXFP6 closely matches FP32 for inference tasks after quantization-aware fine-tuning, and MXFP4 can be used for training generative language models with only a minor accuracy penalty.

[7] An emulation libraries also has been published to provide details on the data science approach and select results of MX in action.