Carry-save adder

A big adder implemented using this technique will usually be much faster than conventional addition of those numbers.

Thus adding two n-digit numbers has to take a time proportional to n, even if the machinery we are using would otherwise be capable of performing many calculations simultaneously.

In electronic terms, using bits, this means that even if we have n one-bit adders at our disposal, we still have to allow a time proportional to n to allow a possible carry to propagate from one end of the number to the other.

In principle the delay can be reduced so that it is proportional to log n, but for large numbers this is no longer the case, because even when carry look-ahead is implemented, the distances that signals have to travel on the chip increase in proportion to n, and propagation delays increase at the same rate.

Once we get to the 512-bit to 2048-bit number sizes that are required in public-key cryptography, carry look-ahead is not of much help.

Carry-save arithmetic works by abandoning any kind of carry propagation.

If you were to add these 3 numbers using conventional methods, it would take you 2 carry-propagate adder delays to get to the answer.

It is therefore obvious that one more binary number can be added to our carry-save result without overflowing our storage capacity: but then what?

The key to success is that at the moment of each partial addition we add three bits: To put it another way, we are taking a carry digit from the position on our right, and passing a carry digit to the left, just as in conventional addition; but the carry digit we pass to the left is the result of the previous calculation and not the current one.

There is still a need to convert the result to binary at the end of a calculation, which effectively just means letting the carries travel all the way through the number just as in a conventional adder.

Montgomery multiplication, which depends on the rightmost digit of the result, is one solution; though rather like carry-save addition itself, it carries a fixed overhead, so that a sequence of Montgomery multiplications saves time but a single one does not.

Fortunately exponentiation, which is effectively a sequence of multiplications, is the most common operation in public-key cryptography.