Collision attack

[2] Hash collisions created this way are usually constant length and largely unstructured, so cannot directly be applied to attack widespread document formats or protocols.

Such a malicious document would contain two different messages in the same document, but conditionally display one or the other through subtle changes to the file: An extension of the collision attack is the chosen-prefix collision attack, which is specific to Merkle–Damgård hash functions.

In 2007, a chosen-prefix collision attack was found against MD5, requiring roughly 250 evaluations of the MD5 function.

The paper also demonstrates two X.509 certificates for different domain names, with colliding hash values.

[5] A real-world collision attack was published in December 2008 when a group of security researchers published a forged X.509 signing certificate that could be used to impersonate a certificate authority, taking advantage of a prefix collision attack against the MD5 hash function.

This meant that an attacker could impersonate any SSL-secured website as a man-in-the-middle, thereby subverting the certificate validation built in every web browser to protect electronic commerce.

The rogue certificate may not be revokable by real authorities, and could also have an arbitrary forged expiry time.

The Flame malware successfully used a new variation of a chosen-prefix collision attack to spoof code signing of its components by a Microsoft root certificate that still used the compromised MD5 algorithm.

[7][8] In 2019, researchers found a chosen-prefix collision attack against SHA-1 with computing complexity between 266.9 and 269.4 and cost less than 100,000 US dollars.

Because digital signature algorithms cannot sign a large amount of data efficiently, most implementations use a hash function to reduce ("compress") the amount of data that needs to be signed down to a constant size.

The second version, which had the same MD5 hash, contained flags which signal web browsers to accept it as a legitimate authority for issuing arbitrary other certificates.

As the main focus of hash functions used in hash tables was speed instead of security, most major programming languages were affected,[17] with new vulnerabilities of this class still showing up a decade after the original presentation.

As of 2021, Jean-Philippe Aumasson and Daniel J. Bernstein's SipHash (2012) is the most widely-used hash function in this class.