A basic ALU has three parallel data buses consisting of two input operands (A and B) and a result output (Y).
The status outputs are various individual signals that convey supplemental information about the result of the current ALU operation.
Typically, the external circuitry employs sequential logic to generate the signals that control ALU operation.
For example, a CPU starts an addition operation by routing the operands from their sources (typically processor registers) to the ALU's operand inputs, while simultaneously applying a value to the ALU's opcode input that configures it to perform an addition operation.
A number of basic arithmetic and bitwise logic functions are commonly supported by ALUs.
Depending on the ALU operation being performed, some status register bits may be changed and others may be left unmodified.
To do this, the algorithm treats each integer as an ordered collection of ALU-size fragments, arranged from most-significant (MS) to least-significant (LS) or vice versa.
This process is repeated for all operand fragments so as to generate a complete collection of partials, which is the result of the multiple-precision operation.
The algorithm writes the partial to designated storage, whereas the processor's state machine typically stores the carry out bit to an ALU status register.
For example: Graphics processing units (GPUs) often contain hundreds or thousands of ALUs which can operate concurrently.
[6] The cost, size, and power consumption of electronic circuitry was relatively high throughout the infancy of the Information Age.
Consequently, all early computers had a serial ALU that operated on one data bit at a time although they often presented a wider word size to programmers.
The first computer to have multiple parallel discrete single-bit ALU circuits was the 1951 Whirlwind I, which employed sixteen such "math units" to enable it to operate on 16-bit words.
[8] Over time, transistor geometries shrank further, following Moore's law, and it became feasible to build wider ALUs on microprocessors.
Today, many modern ALUs have wide word widths, and architectural enhancements such as barrel shifters and binary multipliers that allow them to perform, in a single clock cycle, operations that would have required multiple operations on earlier ALUs.