Floating-point unit

Emulation can be implemented on any of several levels: in the CPU as microcode, as an operating system function, or in user-space code.

This division varies significantly by architecture; some have dedicated floating-point registers, while some, like Intel x86, go as far as independent clocking schemes.

[7] CORDIC routines have been implemented in Intel x87 coprocessors (8087,[8][9][10][11][12] 80287,[12][13] 80387[12][13]) up to the 80486[8] microprocessor series, as well as in the Motorola 68881[8][9] and 68882 for some kinds of floating-point instructions, mainly as a way to reduce the gate counts (and complexity) of the FPU subsystem.

The modular architecture of Bulldozer microarchitecture uses a special FPU named FlexFPU, which uses simultaneous multithreading.

In some current architectures, the FPU functionality is combined with SIMD units to perform SIMD computation; an example of this is the augmentation of the x87 instructions set with SSE instruction set in the x86-64 architecture used in newer Intel and AMD processors.

In the 1980s, it was common in IBM PC/compatible microcomputers for the FPU to be entirely separate from the CPU, and typically sold as an optional add-on.

Acorn Computers opted for the WE32206 to offer single, double and extended precision[21] to its ARM powered Archimedes range, introducing a gate array to interface the ARM2 processor with the WE32206 to support the additional ARM floating-point instructions.

They were also commonly added to higher-end models of Apple Macintosh and Commodore Amiga series, but unlike IBM PC-compatible systems, sockets for adding the coprocessor were not as common in lower-end systems.

These add-on FPUs are host-processor-independent, possess their own programming requirements (operations, instruction sets, etc.)