Comparative method

), and if they are sufficiently numerous, regular, and systematic that they cannot be dismissed as chance similarities, then it must be assumed that they descend from a single parent language called the 'proto-language'.

[8] The ultimate proof of genetic relationship, and to many linguists' minds the only real proof, lies in a successful reconstruction of the ancestral forms from which the semantically corresponding cognates can be derived.In some cases, this reconstruction can only be partial, generally because the compared languages are too scarcely attested, the temporal distance between them and their proto-language is too deep, or their internal evolution render many of the sound laws obscure to researchers.

For instance, English and German both exhibit the effects of a collection of sound changes known as Grimm's Law, which Russian was not affected by.

[14] Even though grammarians of Antiquity had access to other languages around them (Oscan, Umbrian, Etruscan, Gaulish, Egyptian, Parthian...), they showed little interest in comparing, studying, or just documenting them.

In the 9th or 10th century AD, Yehuda Ibn Quraysh compared the phonology and morphology of Hebrew, Aramaic and Arabic but attributed the resemblance to the Biblical story of Babel, with Abraham, Isaac and Joseph retaining Adam's language, with other languages at various removes becoming more altered from the original Hebrew.

[15] In publications of 1647 and 1654, Marcus Zuerius van Boxhorn first described a rigorous methodology for historical linguistic comparisons[16] and proposed the existence of an Indo-European proto-language, which he called "Scythian", unrelated to Hebrew but ancestral to Germanic, Greek, Romance, Persian, Sanskrit, Slavic, Celtic and Baltic languages.

The Scythian theory was further developed by Andreas Jäger (1686) and William Wotton (1713), who made early forays to reconstruct the primitive common language.

[17] However, the origin of modern historical linguistics is often traced back to Sir William Jones, an English philologist living in India, who in 1786 made his famous observation:[18]The Sanscrit language, whatever be its antiquity, is of a wonderful structure; more perfect than the Greek, more copious than the Latin, and more exquisitely refined than either, yet bearing to both of them a stronger affinity, both in the roots of verbs and the forms of grammar, than could possibly have been produced by accident; so strong indeed, that no philologer could examine them all three, without believing them to have sprung from some common source, which, perhaps, no longer exists.

There is a similar reason, though not quite so forcible, for supposing that both the Gothick and the Celtick, though blended with a very different idiom, had the same origin with the Sanscrit; and the old Persian might be added to the same family.The comparative method developed out of attempts to reconstruct the proto-language mentioned by Jones, which he did not name but subsequent linguists have labelled Proto-Indo-European (PIE).

Although Hermann Grassmann explained one of the anomalies with the publication of Grassmann's law in 1862,[23] Karl Verner made a methodological breakthrough in 1875, when he identified a pattern now known as Verner's law, the first sound-law based on comparative evidence showing that a phonological change in one phoneme could depend on other factors within the same word (such as neighbouring phonemes and the position of the accent[24]), which are now called conditioning environments.

The Neogrammarian hypothesis led to the application of the comparative method to reconstruct Proto-Indo-European since Indo-European was then by far the most well-studied language family.

An extreme case is represented by Pirahã, a Muran language of South America, which has been controversially[34] claimed to have borrowed all of its pronouns from Nheengatu.

[45] For instance, the Latin suffix que, "and", preserves the original *e vowel that caused the consonant shift in Sanskrit: Verner's Law, discovered by Karl Verner c. 1875, provides a similar case: the voicing of consonants in Germanic languages underwent a change that was determined by the position of the old Indo-European accent.

[47] For example, the following potential cognate list can be established for Romance languages, which descend from Latin: They evidence two correspondence sets, k : k and k : ʃ: Since French ʃ occurs only before a where the other languages also have a, and French k occurs elsewhere, the difference is caused by different environments (being before a conditions the change), and the sets are complementary.

His reconstructions were, respectively, *hk, *xk, *čk (=[t͡ʃk]), *šk (=[ʃk]), and çk (in which 'x' and 'ç' are arbitrary symbols, rather than attempts to guess the phonetic value of the proto-phonemes).

[37] By the principle of economy, the reconstruction of a proto-phoneme should require as few sound changes as possible to arrive at the modern reflexes in the daughter languages.

For example, here is the traditional Proto-Indo-European stop inventory:[56] An earlier voiceless aspirated row was removed on grounds of insufficient evidence.

Kossinna asserted that cultures represent ethnic groups, including their languages, but his law was rejected after World War II.

Loanwords imitate the form of the donor language, as in Finnic kuningas, from Proto-Germanic *kuningaz ('king'), with possible adaptations to the local phonology, as in Japanese sakkā, from English soccer.

At first sight, borrowed words may mislead the investigator into seeing a genetic relationship, although they can more easily be identified with information on the historical stages of both the donor and receiver languages.

Borrowing on a larger scale occurs in areal diffusion, when features are adopted by contiguous languages over a geographical area.

For instance, the Mainland Southeast Asia linguistic area, before it was recognised, suggested several false classifications of such languages as Chinese, Thai and Vietnamese.

For instance, the Latin declension pattern was lost in Romance languages, resulting in an impossibility to fully reconstruct such a feature via systematic comparison.

The reconstruction of unattested proto-languages lends itself to that illusion since they cannot be verified, and the linguist is free to select whatever definite times and places seems best.

Still, however, it may remain doubtfull whether the Danes and the Swedes could not, in general, understand each other tolerably well... nor is it possible to say if the twenty ways of pronouncing the sounds, belonging to the Chinese characters, ought or ought not to be considered as so many languages or dialects....

However, Hock[78] observes: The discovery in the late nineteenth century that isoglosses can cut across well-established linguistic boundaries at first created considerable attention and controversy.

Examples of strikingly complicated and even circular developments are indeed known to have occurred (such as Proto-Indo-European *t > Pre-Proto-Germanic *þ > Proto-Germanic *ð > Proto-West-Germanic *d > Old High German t in fater > Modern German Vater), but in the absence of any evidence or other reason to postulate a more complicated development, the preference of a simpler explanation is justified by the principle of parsimony, also known as Occam's razor.

prefer to view the reconstructed features as abstract representations of sound correspondences, rather than as objects with a historical time and place.

For example, Finnic languages such as Finnish have borrowed many words from an early stage of Germanic, and the shape of the loans matches the forms that have been reconstructed for Proto-Germanic.

[80] By contrast, some approaches are incompatible with the comparative method, including contentious glottochronology and even more controversial mass lexical comparison considered by most historical linguists to be flawed and unreliable.