The contents include Chinese character information processing, word segmentation, proper noun recognition, natural language understanding and generation, corpus linguistics, and machine translation.
The input code of a Chinese character is its pinyin letter string followed by an optional number representing the tone.
Popular form-based encoding methods include Wubi (五笔) in the Mainland and Cangjie (仓颉) in Taiwan and Hong Kong.
[6] The most important feature of intelligent input is the application of contextual constraints for candidate character selection.
Though the non-toned Pinyin letters of 大学 and 大雪 are both "daxue", the computer can make a reasonable selection based on the subsequent words.
It includes 6,763 Chinese characters, with 3,755 frequently-used ones sorted by Pinyin, and the rest by radicals (indexing components).
Each character is encoded with a two byte hexadecimal code, for example, 香 (ADBB) 港 (B4E4) 龍 (C073).
The Basic Multilingual Plane (BMP) is a 2-byte kernel version of Unicode with 2^16=65,536 code points for important characters of many languages.
[13] Like English and other languages, Chinese characters are output on printers and screens in different fonts and styles.
The most popular Chinese fonts are the Song (宋体), Kai (楷体), Hei (黑体) and Fangsong (仿宋体) families.
However there will be no guarantee of 100% percent correctness in the foreseeable future, because that will involve a complete understanding of the text.
An alternative solution is to encourage people to write in a word segmented way, like the case in English [18].
and is written in English with the initial letter of each word capitalized, for example, "Mr. John Nealon", "America" and "Cambridge University".
However such a list can never be complete, considering the huge number of places and people all over the world, not to mention their dynamic feature of coming, changing and going.
For example, there is a town named 民众 (Minzhong) in southern China, which is also a common noun meaning "people".
Therefore, recognition of names of people and place has to make use of their distinguishing features in internal composition and external context.
For example, in "在广东省中山市民众镇", component words 省 (province), 市 (city) and 镇 (town) are end markers of place names, while 在 (in, at, on) is a preposition frequently appearing in front of a location.