National was working on further improvements in the 32732, but eventually gave up attempting to compete in the central processing unit (CPU) space.
The NS32000 series traces its history to an effort by National Semiconductor to produce a single-chip implementation of the VAX-11 architecture.
This flexibility was considered the paragon of design in the era of complex instruction set computers (CISC).
[3] The original processor family consisted of the NS16032 CPU and a NS16C032 low-power variant, both having a 16-bit data path and so requiring two machine cycles to load a single 32-bit word.
The NS16008 was a cut-down version with an 8-bit external data path and no virtual memory support, which had a reduced pin count and was thus somewhat easier to implement.
Addressing modes can involve up to two displacements and two memory indirections per operand as well as scaled indexing, making the longest conceivable instruction 23 bytes.
Unlike some other processors, automatic increment of the base register is not provided; the only exception is a "top of stack" addressing modes that pop sources and push destinations.
By 1984, after two years, the errata list still contained items specifying uncontrollable conditions that would result in the processor coming to a halt, forcing a reset.
[7] An early 1985 article about the 32016-based Whitechapel MG-1 workstation noted that the 32081 memory management unit was "suffering from bugs" and had been situated on its own board providing hardware fixes.
The "Z" language is similar to today's Verilog and VHDL, but has a Pascal-like syntax and is optimized for two-phase clock designs.
However, by the times the fruit of these efforts were being felt in the design, numerous 68k machines were already on the market, notably the Apple Macintosh, and the 32016 never saw widespread use.
From the datasheet, the enhancements include "the addition of new dedicated addressing hardware (consisting of a high speed ALU, a barrel shifter and an address register), a very efficient increased (20 bytes) instruction prefetch queue, a new system/memory bus interface/protocol, increased efficiency slave processor protocol and finally enhancements of microcode."
The aggregate performance boost of the NS32332 from these enhancements only made it 50 percent faster than the original NS32032, and therefore less than that of the main competitor, the MC68020.
At this stage RISC architectures were starting to make inroads, and the main competitors became the now equally dead AM29000 and MC88000, which was considered faster than the NS32532.
[12] The NS32532 was the basis of the PC532, a "public domain" hardware project, and one of the few to produce a useful machine running a real operating system (in this case, Minix or NetBSD).
Swordfish has an integrated floating point unit, timers, DMA controllers and other peripherals not normally available in microprocessors.
The chief architect of the Swordfish is Donald Alpert, who went on to manage the architectural team designing the Pentium.
These processors had some success in the laser printer and fax market, despite intense competition from AMD and Intel RISC chips.
The key difference between this and the NS32C016 is the integration of the expensive timing control unit (TCU) which generates the needed two-phase clock from a crystal, and the removal of the floating point coprocessor support, which freed up microcode space for the useful BitBLT instruction set, which significantly improves the performance in laser printer operations, making this 60,000 transistor chip faster than the 200,000 transistor MC68020.
The NS32CG160 is the CG16 with timers and DMA peripherals, while the NS32FV/FX16x chips have extra DSP functionality on top of the CG16 BitBLT core for the fax and answering machine market.
The bus usage of the NS32032 is about 50 percent, owing to its very compact instruction set, or its very slow pipeline as competitors would phrase it.
Indeed, one suggested application of the NS32032 was as part of a "fault-tolerant transaction system" employing "two 32032s in parallel and comparing results on alternate memory cycles to detect soft errors".
[19] Fully software-compatible with an NS32532 CPU with N32381 FPU, it is significantly faster when implemented on an FPGA,[20] both operating at a higher clock rate and using fewer cycles per instruction.