[4][5] Christian Rudder has also used this concept with data from online dating profiles and Twitter posts to determine the phrases most characteristic of a given race or gender in his book Dataclysm.
[6] SIPs with a linguistic density of two or three words, adjective, adjective, noun or adverb, adverb, verb, will signal the author's attitude, premise or conclusions to the reader or express an important idea.
This method only checks those texts that have been published and that have been digitized online.
For example, a submission by, say, a student that contained the phrase "garden style, praising irregularity in design", might be searched for using Google.com and will yield the original Wikipedia article about Sir William Temple, English political figure and essayist.
Statistically improbable phrases of Darwin's On the Origin of Species could be: temperate productions, genera descended, transitional gradations, unknown progenitor, fossiliferous formations, our domestic breeds, modified offspring, doubtful forms, closely allied forms, profitable variations, enormously remote, transitional grades, very distinct species and mongrel offspring.