Stylometry

[5] It has legal as well as academic and literary applications, ranging from the question of the authorship of Shakespeare's works to forensic linguistics and has methodological similarities with the analysis of text readability.

Authors may use adversarial stylometry to resist this identification by eliminating their own stylistic characteristics without changing the meaningful content of their communications.

The modern practice of the discipline received publicity from the study of authorship problems in English Renaissance drama.

[7] The development of computers and their capacities for analyzing large quantities of data enhanced this type of effort by orders of magnitude.

A. Q. Morton produced a computer analysis of the fourteen Epistles of the New Testament attributed to St. Paul, which indicated that six different authors had written that body of work.

One notable early success was the resolution of disputed authorship of twelve of The Federalist Papers by Frederick Mosteller and David Wallace.

[9] While there are still questions concerning initial assumptions and methods (and, perhaps, always will be), few now dispute the basic premise that linguistic analysis of written texts can produce valuable information and insight.

(Indeed, this was apparent even before the advent of computers: the successful application of a textual/linguistic analysis to the Fletcher canon by Cyrus Hoy and others yielded clear results during the late 1950s and early 1960s.)

[19] Textual features of interest for authorship attribution are on the one hand computing occurrences of idiosyncratic expressions or constructions (e.g. checking for how the author uses interpunction or how often the author uses agentless passive constructions) and on the other hand similar to those used for readability analysis such as measures of lexical variation and syntactic variation.

[35] All adversarial stylometry shares the core idea of faithfully paraphrasing the source text so that the meaning is unchanged but the stylistic signals are obscured.

[35] Modern stylometry uses computers for statistical analysis, and artificial intelligence and access to the growing corpus of texts available via the Internet.

Stylometric methods are used for several academic topics, as an application of linguistics, lexicography, or literary study,[1] in conjunction with natural language processing and machine learning, and applied to plagiarism detection, authorship analysis, or information retrieval.

Whereas in the past, stylometry emphasized the rarest or most striking elements of a text, contemporary techniques can isolate identifying patterns even in common parts of speech.

In this context, unlike for information retrieval, the observed occurrence patterns of the most common words are more interesting than the topical terms which are less frequent.

[93] Neural networks, a special case of statistical machine learning methods, have been used to analyze authorship of texts.

[96] A study used deep belief networks (DBN) for authorship verification model applicable for continuous authentication (CA).

[citation needed] The diffusion of the internet has shifted the authorship attribution attention towards online texts (web pages, blogs, etc.)

), and other types of written information that are far shorter than an average book, much less formal and more diverse in terms of expressive elements such as colors, layout, fonts, graphics, emoticons, etc.

[98] In addition, content-specific and idiosyncratic cues (e.g., topic models and grammar checking tools) were introduced to unveil deliberate stylistic choices.