Author profiling

Thomas Corwin Mendenhall, an American autodidact physicist and meteorologist, was the first to apply this process to the works of Francis Bacon, William Shakespeare, and Christopher Marlowe.

[2] Although much progress has been made in the 21st century, the task of author profiling remains an unsolved problem due to its difficulty.

For example, function words, as well as part-of-speech analysis, can be referenced to determine the author's gender and truth of a text.

[4] This has sparked greater research efforts because of the advantages analysing digital texts can bring to sectors like marketing and business.

[10] The most effective attributes for author profiling on digital texts involve a combinations of stylistic and content features.

The increased integration of social media in people's daily lives have made them a rich source of textual data for author profiling.

Features of irregularity include deviation from normal linguistic standards such as spelling errors, unstandardised transliteration as with the substitution of letters with numbers, shorthands, user-created abbreviations for phrases and et cetera, which may pose a challenge to author profiling.

In the context of Facebook, author profiling mainly involves English textual data, but also uses non-english languages that include: Roman Urdu, Arabic, Brazilian Portuguese, Spanish.

[16][11] While author profiling studies on Facebook have been predominantly for gender and age-group identification, there have been attempts to derive attributes to predict religiosity, the IT background of users, and even basic emotions (as defined by Paul Ekman) among others.

[18] This differs from the use of punctuation symbols for emoticons in Western languages, or the common use of the Unicode emojis in other platforms such as Facebook, Instagram, et cetera.

Data is acquired from Weibo microblog posts of willing participants to be analysed, and used to train algorithms that build concept-based profiles of users to a certain accuracy.

Sources of data for author profiling from chat logs include platforms such as Yahoo!, AIM (software) and WhatsApp.

The frequency of verbs, pronouns and other word classes are used to profile and classify emotions in the writings of authors, as well as their gender and age.

[24] Author profiling using classification models that were used on physical documents in the past, such as Support Vector Machines, have also been tested on blogs.

[citation needed] In author profiling for email, content is processed for important textual data, while unimportant features such as metadata and other hyper-text markup language (HTML) redundancies are excluded.

[25] Further analysis of email textual content in author profiling tasks involves the extraction of tone of voice, sentiment, semantics and other linguistic features to be processed.

[30] One of the earliest and best-known examples of the use of author profiling is by Roger Shuy, who was asked to examine a ransom note linked to a notorious kidnapping case in 1979.

These methods, such as those adopted by literary critic Donald Wayne Foster, are said to be speculative and based entirely on one's subjective experience, and therefore cannot be tested empirically.

[34] However, the task of identifying bots solely from textual data (i.e. without meta-data) is significantly more challenging, requiring author profiling techniques.

Author profiling techniques are helpful to business experts in making better informed strategic decisions based on the demographics of their target group.

[37] In addition, businesses can target their marketing campaigns at groups of consumers who match the demographics and profile of current customers.

The show highlighted the importance of author profiling in criminal forensics, as it was critical in the capture of the real Unabomber culprit in 1996.

Thomas Corwin Mendenhall, American physicist, 1841–1924
Crucifix, Rosary and Holy Bible with Apocrypha NRSV