The Lynch–Swartzlander design is smaller, has lower fan-out, and does not suffer from wiring congestion; however to be used the process node must support Manchester carry chain implementations.
The general problem of optimizing parallel prefix adders is identical to the variable block size, multi level, carry-skip adder optimization problem, a solution of which is found in Thomas Lynch's thesis of 1996.
The least-significant span is treated specially: it is merged with the carry in to the addition, and it only produces a generate bit, as no propagation is possible.
However, there is significant wiring congestion; in the second-last stage of a 64-bit adder, the most significant half of the spans to be merged each require separate generate and propagate signals from spans 16 bits away, necessitating 32 horizontal wires across the adder.
The final stage is similar; although only generate bits are needed, 32 of them are required to cross the adder.
E.g., the first (least-significant) sum bit is calculated by XORing the propagate in the farthest-right red box (a "1") with the carry-in (a "0"), producing a "1".
[15] Enhancements to the original implementation include increasing the radix and sparsity of the adder.
As shown, power and area of the carry generation is improved significantly, and routing congestion is substantially reduced.