Correlation attacks are a class of cryptographic known-plaintext attacks for breaking stream ciphers whose keystreams are generated by combining the output of several linear-feedback shift registers (LFSRs) using a Boolean function.
Correlation attacks exploit a statistical weakness that arises from the specific Boolean function chosen for the keystream.
This vulnerability allows an attacker to brute-force the key for the individual LFSR and the rest of the system separately.
For instance, in a keystream generator where four 8-bit LFSRs are combined to produce the keystream, and if one of the registers is correlated to the Boolean function output, it becomes possible to brute force it first, followed by the remaining three LFSRs.
If a second register is correlated with the function, the process may be repeated and decrease the attack complexity down to 28 + 28 + 216 for an effort saving factor of just under 65028.
Then, the Boolean function combining the three registers to provide the generator output is given by
Similarly, many file formats or network protocols have very standard headers or footers.
This makes the 32 consecutive bits of the generator output easy to determine.
This enables a brute-force search of the space of possible keys (initial values) for LFSR-3 (assuming we know the tapped bits of LFSR-3, an assumption which is in line with Kerckhoffs' principle).
If we have guessed incorrectly, we should expect roughly half, or 16, of the first 32 bits of these two sequences to match.
For realistic values, it is a very substantial saving and can make brute-force attacks very practical.
We may begin a brute force attack against LFSR-2 independently of the keys of LFSR-1 and LFSR-3, leaving only LFSR-1 unbroken.
Thus, we are able to break the Geffe generator with as much effort as required to brute force 3 entirely independent LFSRs.
should be chosen so the correlation between each variable and the combining function's output is as close as possible to 50%.
In practice, it may be difficult to find a function that achieves this without sacrificing other design criteria, e.g., period length, so a compromise may be necessary.
While the above example illustrates well the relatively simple concepts behind correlation attacks, it perhaps simplifies the explanation of precisely how the brute forcing of individual LFSRs proceeds.
Incorrectly guessed keys will generate LFSR output that agrees with the generator output roughly 50% of the time because, given two random bit sequences of a given length, the probability of agreement between the sequences at any particular bit is 0.5.
This is particularly salient in the case of LFSRs whose correlation with the generator is not especially strong; for small enough correlations, it is certainly not outside the realm of possibility that an incorrectly guessed key will also lead to LFSR output that agrees with the desired number of bits of the generator output.
It may be possible to identify a number of potential keys, however, which is still a significant breach of the cipher's security.
Estimates of the length of known plain text required for a given correlation can be calculated using the binomial distribution.
The table below shows a measure of the computational cost for various attacks on a keystream generator consisting of eight 8-bit LFSRs combined by a single Boolean function.
Understanding the calculation of cost is relatively straightforward: the leftmost term of the sum represents the size of the key space for the correlated generators, and the rightmost term represents the size of the key space for the remaining generators.
While higher-order correlations lead to more powerful attacks, they are also more difficult to find, as the space of available Boolean functions to correlate against the generator output increases as the number of arguments to the function does.
Obviously, higher correlation immunity makes a function more suitable for use in a keystream generator (although this is not the only thing that needs to be considered).
Siegenthaler showed that the correlation immunity m of a Boolean function of algebraic degree d of n variables satisfies
; for a given set of input variables, this means that a high algebraic degree will restrict the maximum possible correlation immunity.
Given the probable extreme severity of a correlation attack's impact on a stream cipher's security, it should be essential to test a candidate Boolean combination function for correlation immunity before deciding to use it in a stream cipher.
However, it is important to note that high correlation immunity is a necessary, but not sufficient condition for a Boolean function to be appropriate for use in a keystream generator.
Research has been conducted into methods for easily generating Boolean functions of a given size which are guaranteed to have at least some particular order of correlation immunity.
This research has uncovered links between correlation immune Boolean functions and error correcting codes.