Duplicate content

Non-malicious duplicate content may include variations of the same page, such as versions optimized for normal HTML, mobile devices, or printer-friendliness, or store items that can be shown via multiple distinct URLs.

[4] The number of possible URLs crawled being generated by server-side software has also made it difficult for web crawlers to avoid retrieving duplicate content.

Endless combinations of HTTP GET (URL-based) parameters exist, of which only a small selection will actually return unique content.

For example, a simple online photo gallery may offer three options to users, as specified through HTTP GET parameters in the URL.

[6] In certain cases, search engines penalize websites' and individual offending pages' rankings in the search engine results pages (SERPs) for duplicate content considered “spammy.” Plagiarism detection or content similarity detection is the process of locating instances of plagiarism or copyright infringement within a work or document.