During the Northern and Southern period, Middle Chinese went through several sound changes and split into several varieties following prolonged geographic and political separation.
The royal courts of the Ming and early Qing dynasties operated using a koiné language known as Guanhua, based on the Nanjing dialect of Mandarin.
The language is written primarily using a logography of Chinese characters, largely shared by readers who may otherwise speak mutually unintelligible varieties.
[12] The next attested stage came from inscriptions on bronze artifacts dating to the Western Zhou period (1046–771 BCE), the Classic of Poetry and portions of the Book of Documents and I Ching.
[15] Most recent reconstructions also describe an atonal language with consonant clusters at the end of the syllable, developing into tone distinctions in Middle Chinese.
Its use in writing remained nearly universal until the late 19th century, culminating with the widespread adoption of written vernacular Chinese with the May Fourth Movement beginning in 1919.
Chinese words with these pronunciations were also extensively imported into the Korean, Japanese and Vietnamese languages, and today comprise over half of their vocabularies.
[41] Vietnam, Korea, and Japan each developed writing systems for their own languages, initially based on Chinese characters, but later replaced with the hangul alphabet for Korean and supplemented with kana syllabaries for Japanese, while Vietnamese continued to be written with the complex chữ Nôm script.
[42] These varieties form a dialect continuum, in which differences in speech generally become more pronounced as distances increase, though the rate of change varies immensely.
Specifically, most Chinese immigrants to North America until the mid-20th century spoke Taishanese, a variety of Yue from a small coastal area around Taishan, Guangdong.
[45] Local varieties of Chinese are conventionally classified into seven dialect groups, largely based on the different evolution of Middle Chinese voiced initials:[47][48] Proportions of first-language speakers[6] The classification of Li Rong, which is used in the Language Atlas of China (1987), distinguishes three further groups:[46][49] Some varieties remain unclassified, including the Danzhou dialect on Hainan, Waxianghua spoken in western Hunan, and Shaozhou Tuhua spoken in northern Guangdong.
In some cases, monosyllabic words have become disyllabic formed from different characters without the use of compounding, as in 窟窿; kūlong from 孔; kǒng; this is especially common in Jin varieties.
The 20th century Yuen Ren Chao poem Lion-Eating Poet in the Stone Den exploits this, consisting of 92 characters all pronounced shi.
For example, 石; shí alone, and not 石头; 石頭; shítou, appears in compounds as meaning 'stone' such as 石膏; shígāo; 'plaster', 石灰; shíhuī; 'lime', 石窟; shíkū; 'grotto', 石英; 'quartz', and 石油; shíyóu; 'petroleum'.
Examples of Chinese words of more than two syllables include 汉堡包; 漢堡包; hànbǎobāo; 'hamburger', 守门员; 守門員; shǒuményuán; 'goalkeeper', and 电子邮件; 電子郵件; diànzǐyóujiàn; 'e-mail'.
The CC-CEDICT project (2010) contains 97,404 contemporary entries including idioms, technology terms, and names of political figures, businesses, and products.
The most comprehensive pure linguistic Chinese-language dictionary, the 12-volume Hanyu Da Cidian, records more than 23,000 head Chinese characters and gives over 370,000 definitions.
The 1999 revised Cihai, a multi-volume encyclopedic dictionary reference work, gives 122,836 vocabulary entry definitions under 19,485 Chinese characters, including proper names, phrases, and common zoological, geographical, sociological, scientific, and technical terms.
The 2016 edition of Xiandai Hanyu Cidian, an authoritative one-volume dictionary on modern standard Chinese language as used in mainland China, has 13,000 head characters and defines 70,000 words.
Some early Indo-European loanwords in Chinese have been proposed, notably 'honey' (蜜; mì), 'lion' (狮; 獅; shī), and perhaps 'horse' (马; 馬; mǎ), 'pig' (猪; 豬; zhū), 'dog' (犬; quǎn), and 'goose' (鹅; 鵝; é).
[71] Ancient words borrowed from along the Silk Road during the Old Chinese period include 'grape' (葡萄; pútáo), 'pomegranate' (石榴; shíliú), and 'lion' (狮子; 獅子; shīzi).
Words borrowed from the nomadic tribes of the Gobi, Mongolian or northeast regions generally have Altaic etymologies, such as 琵琶 (pípá), the Chinese lute, or 'cheese or yogurt' (酪; lào), but from exactly which source is not always clear.
Other examples include Occasionally, compromises between the transliteration and translation approaches become accepted, such as 汉堡包; 漢堡包 (hànbǎobāo; 'hamburger') from 汉堡; 'Hamburg' + 包 ('bun').
Sometimes translations are designed so that they sound like the original while incorporating Chinese morphemes (phono-semantic matching), such as 马利奥; 馬利奧 (Mǎlì'ào) for the video game character 'Mario'.
A rather small number of direct transliterations have survived as common words, including 沙发; 沙發 (shāfā; 'sofa'), 马达; 馬達 (mǎdá; 'motor'), 幽默 (yōumò; 'humor'), 逻辑; 邏輯 (luóji, luójí; 'logic'), 时髦; 時髦 (shímáo; 'smart (fashionable)'), and 歇斯底里 (xiēsīdǐlǐ; 'hysterics').
With the rising popularity of the Internet, there is a current vogue in China for coining English transliterations, for example, 粉丝; 粉絲 (fěnsī; 'fans'), 黑客 (hēikè; 'hacker'), and 博客 (bókè; 'blog').
Early Indian translators, working in Sanskrit and Pali, were the first to attempt to describe the sounds and enunciation patterns of Chinese in a foreign language.
After the 15th century, the efforts of Jesuits and Western court missionaries resulted in some Latin character transcription/writing systems, based on various variants of Chinese languages.
Only 4% were categorized as pictographs, including many of the simplest characters, such as 人 (rén; 'human'), 日 (rì; 'Sun'), 山 (shān; 'mountain'), and 水 (shuǐ; 'water').
[81] In 1991, there were 2,000 foreign learners taking China's official Chinese Proficiency Test, called Hanyu Shuiping Kaoshi (HSK), comparable to the English Cambridge Certificate, but by 2005 the number of candidates had risen sharply to 117,660[82] and in 2010 to 750,000.