GB 2312

[2] GB/T 2312-1980 has been superseded by GBK and GB 18030, which include additional characters, but GB/T 2312 remains in widespread use as a subset of those encodings.

[4] However, all major web browsers decode GB2312-marked documents as if they were marked with the superset GBK encoding, except for Safari and Edge on the label GB_2312.

Old GB 2312 standard includes 6,763 Chinese characters (on two levels: the first is arranged by reading, the second by radical then number of strokes), along with symbols and punctuation, Japanese kana, the Greek and Cyrillic alphabets, Zhuyin, and a double-byte set of Pinyin letters with tone marks.

For example, the character "外" (meaning: foreign) is located in row 45 position 66,[9] thus its qūwèi code is 45-66.

EUC-CN is often used as the character encoding (i.e. for external storage) in programs that deal with GB/T 2312, thus maintaining compatibility with ASCII.

Compared to UTF-8, GB/T 2312 (whether native or encoded in EUC-CN) is more storage efficient: while UTF-8 uses three bytes[a] per CJK ideograph, GB/T 2312 only uses two.

The result of addition to the row number of the code point will form the high byte, and the result of addition to the cell number of the code point will form the low byte.

As the byte range overlaps ASCII significantly, special characters are required to indicate whether a character is in the ASCII range or is part of the two-byte sequence of extended region, namely the Shift Out and Shift In functions.

This poses a risk for misencoding as improper handling of text can result in missing information.

The result of addition to the row number of the code point will form the high byte, and the result of addition to the cell number of the code point will form the low byte similar to EUC encoding.

In the tables below, where a pair of hexadecimal numbers is given for a prefix byte or a coding byte, the smaller (with the eighth bit unset or unavailable) is used when encoded over GL (0x21-0x7E), as in ISO-2022-CN or HZ-GB-2312, and the larger (with the eighth bit set) is used in the more typical case of it being encoded over GR (0xA1-0xFE), as in EUC-CN, GBK or GB 18030.

This chart details the overall layout of the main plane of the GB/T 2312 character set by lead byte.

Ruby 2.2 is compatible with both implementations; it internally converts the conflictive characters to the GB 18030 subset.

The W3C/WHATWG technical recommendation for use with HTML5 specifies a GBK encoding to be inferred for streams labelled gb2312, which in turn uses a GB18030 decoder.

This row contains basic support for the modern Greek alphabet, without diacritics or the final sigma.

[19] Conversely, ISO-IR-165 includes patterned semigraphic characters in this row (mostly without exact counterparts in Unicode), colliding with the code positions used for the vertical extensions.

Contrast row 5 of KS X 1001, which offsets the Greek letters to include the Roman numerals first.