Content sniffing

Content sniffing is generally used to compensate for a lack of accurate metadata that would otherwise be required to enable the file to be interpreted correctly.

Content sniffing techniques tend to use a mixture of techniques that rely on the redundancy found in most file formats: looking for file signatures and magic numbers, and heuristics including searching for well-known representative substrings, the use of byte frequency and n-gram tables, and Bayesian inference.

A specification exists for media type sniffing in HTML5, which attempts to balance the requirements of security with the need for reverse compatibility with web content with missing or incorrect MIME-type data.

Numerous web browsers use a more limited form of content sniffing to attempt to determine the character encoding of text files for which the MIME type is already known.

For instance, Internet Explorer 7 may be tricked to run JScript in circumvention of its policy by allowing the browser to guess that an HTML-file was encoded in UTF-7.