Hapax legomenon

[6] Hapax legomenon refers to the appearance of a word or an expression in a body of text, not to either its origin or its prevalence in speech.

Hapax legomena in ancient texts are usually difficult to decipher, since it is easier to infer meaning from multiple contexts than from just one.

[9] To take account of the varying length of the epistles, Workman also calculated the average number of hapax legomena per page of the Greek text, which ranged from 3.6 to 13, as summarized in the diagram on the right.

This was reinforced when Workman looked at several plays by Shakespeare, which showed similar variations (from 3.4 to 10.4 per page of Irving's one-volume edition), as summarized in the second diagram on the right.

[9] Apart from author identity, there are several other factors that can explain the number of hapax legomena in a work:[10] In the particular case of the Pastoral Epistles, all of these variables are quite different from those in the rest of the Pauline corpus, and hapax legomena are no longer widely accepted as strong indicators of authorship; those who reject Pauline authorship of the Pastorals rely on other arguments.

The Jewish Encyclopedia points out that, although there are 1,500 hapaxes in the Hebrew Bible, only about 400 are not obviously related to other attested word forms.

corpus linguistics and machine-learned NLP, it is common to disregard hapax legomena (and sometimes other infrequent words), as they are likely to have little value for computational techniques.

This disregard has the added benefit of significantly reducing the memory use of an application, since, by Zipf's law, many words are hapax legomena.

Rank-frequency plot for words in the novel Moby-Dick . About 44% of the distinct set of words in this novel, such as "matrimonial", occur only once, and so are hapax legomena (red). About 17%, such as "dexterity", appear twice (so-called dis legomena , in blue). Zipf's law predicts that the words in this plot should approximate a straight line with slope -1.
The word " honorificabilitudinitatibus " as found in the first edition of William Shakespeare 's play Love's Labour's Lost
Muspilli line 57: "dar nimac denne mak andremo helfan uora demo muspille" ( Bavarian State Library Clm 14098, f. 121r)