[1] The objective of the Fletcher checksum was to provide error-detection properties approaching those of a cyclic redundancy check but with the lower computational effort associated with summation techniques.
So, the simple checksum is computed by adding together all the 8-bit bytes of the message, dividing by 255 and keeping only the remainder.
The checksum value is transmitted with the message, increasing its length to 137 bytes, or 1096 bits.
The first weakness of the simple checksum is that it is insensitive to the order of the blocks (bytes) in the data word (message).
This is the modular sum of the values taken by the simple checksum as each block of the data word is added to it.
While there is an infinity of parameters, the original paper only studies the case K=8 (word length) with modulus 255 and 256.
The 16 and 32 bits versions (Fletcher-32 and -64) have been derived from the original case and studied in subsequent specifications or papers.
When the data word is divided into 8-bit blocks, as in the example above, two 8-bit sums result and are combined into a 16-bit Fletcher checksum.
When the data word is divided into 16-bit blocks, two 16-bit sums result and are combined into a 32-bit Fletcher checksum.
When the data word is divided into 32-bit blocks, two 32-bit sums result and are combined into a 64-bit Fletcher checksum.
However, the reduction in size of the universe of possible checksum values acts against this and reduces performance slightly.
At the end of the input data, the two sums are combined into the 16-bit Fletcher checksum value and returned by the function on line 13.
The most important optimization consists in using larger accumulators and delaying the relatively costly modulo operation for as long as it can be proven that no overflow will occur.
Further benefit can be derived from replacing the modulo operator with an equivalent function tailored to this specific case—for instance, a simple compare-and-subtract, since the quotient never exceeds 1.
[6] 8-bit implementation (16-bit checksum) 16-bit implementation (32-bit checksum), with 8-bit ASCII values of the input word assembled into 16-bit blocks in little-endian order, the word padded with zeros as necessary to the next whole block, using modulus 65535 and with the result presented as the sum-of-sums shifted left by 16 bits (multiplied by 65536) plus the simple sum 32-bit implementation (64-bit checksum) As with any calculation that divides a binary data word into short blocks and treats the blocks as numbers, any two systems expecting to get the same result should preserve the ordering of bits in the data word.
If blocks are extracted from the data word in memory by a simple read of a 16-bit unsigned integer, then the values of the blocks will be different in the two systems, due to the reversal of the byte order of 16-bit data elements in memory, and the checksum result will be different as a consequence.
The implementation examples, above, do not address ordering issues so as not to obscure the checksum algorithm.