Extended Unix Code

The EUC-CN form of GB 2312 and EUC-KR are examples of such two-byte EUC codes.

Modern applications are more likely to use UTF-8, which supports all of the glyphs of the EUC codes, and more, and is generally more portable with fewer vendor deviations and errors.

The structure of EUC is based on the ISO/IEC 2022 standard, which specifies a system of graphical character sets that can be represented with a sequence of the 94 7-bit bytes 0x21–7E, or alternatively 0xA1–FE if an eighth bit is available.

[2] These fixed-length encoding formats are suited to internal processing and are not usually encountered in interchange.

[4] EUC-CN[6] is the usual encoded form of the GB 2312 standard for simplified Chinese characters.

The non-GB2312 portion of the 748 code contains traditional and Hong Kong characters and other glyphs used in newspaper typesetting.

IBM code page 1381 (CCSID 1381) comprises the single-byte code page 1115 (CPGID 1115 as CCSID 1115) and the double-byte code page 1380 (CPGID 1380 as CCSID 1380),[7] which encodes GB 2312 the same way as EUC-CN, but deviates from the EUC structure by extending the lead byte range back to 0x8C, adding 31 IBM-selected characters in 0x8CE0 through 0x8CFE and adding 1880 user-defined characters with lead bytes 0x8D through 0xA0.

Other EUC-CN variants deviating from the EUC mechanism include the Mac OS Chinese Simplified script (known as Code page 10008 or x-mac-chinesesimp).

Besides these changes to the lead byte range, the other distinctive feature of the double-byte portion of Mac OS Chinese Simplified is the inclusion of two extensions to the basic GB 2312-80 set in rows 6 and 8.

This encoding scheme allows the easy mixing of 7-bit ASCII and 8-bit Japanese without the need for the escape characters employed by ISO-2022-JP, which is based on the same character set standards, and without ASCII bytes appearing as trail bytes (unlike Shift JIS).

Compared to EUC-CN or EUC-KR, EUC-JP did not become as widely adopted on PC and Macintosh systems in Japan, which used Shift JIS or its extensions (Windows code page 932 on Microsoft Windows, and MacJapanese on classic Mac OS), although it became heavily used by Unix or Unix-like operating systems (except for HP-UX).

Characters are encoded as follows: Vendor extensions to EUC-JP (from, for example, the Open Software Foundation, IBM or NEC) were often allocated within the individual code sets,[25][26] as opposed to using invalid EUC sequences (as in popular extensions of EUC-CN and EUC-KR).

HP-16 encodes JIS X 0208 using the same bytes as in EUC-JP, but does not use the single shift codes (thus omitting code sets 2 and 3), and adds three user-defined regions which do not follow the packed-format EUC structure:[28] The IKIS (Interactive Kanji Information System) encoding used by Data General resembles EUC-JP without single shifts, i.e. with only code sets 0 and 1.

Half-width katakana are instead included in row 8 of JIS X 0208 (colliding with the box-drawing characters added to the standard in 1983).

This results in duplicate encodings for the ideographic space—0x4040 per the DBCS-Host code structure, and 0xA1A1 as in EUC-JP.

This differs from IBM's DBCS-Host encoding for Japanese, the layout of which builds on versions which predate JIS X 0208 altogether.

It is usually referred to as Wansung (Korean: 완성; RR: Wanseong; lit.

A common extension of EUC-KR is the Unified Hangul Code (통합형 한글 코드; Tonghabhyeong Hangeul Kodeu,[42] or 통합 완성형; Tonghab Wansunghyung), which is the default Korean codepage on Microsoft Windows.

Unified Hangul Code extends EUC-KR by using codes that do not conform to the EUC structure to incorporate additional syllable blocks, completing the coverage of the composed syllable blocks available in Johab and Unicode.

The W3C/WHATWG Encoding Standard used by HTML5 incorporates the Unified Hangul Code extensions into its definition of EUC-KR.

[45] Other encodings incorporating EUC-KR as a subset include the Mac OS Korean script (known as Code page 10003 or x-mac-korean),[13] which was used by HangulTalk (MacOS-KH), the Korean localization of the classic Mac OS.

[47] Although none of these additional single-byte codes are within the lead byte range of plain EUC-KR (unlike Apple's extensions to EUC-CN, see above), some are within the lead byte range of Unified Hangul Code (specifically, 0x81, 0x82, 0x83 and 0x84).

Similarly to KS X 1001, the North Korean KPS 9566 standard is typically used in EUC form; in these contexts, it is sometimes referred to as EUC-KP.

Relationship between packed EUC and other 8-bit ISO 2022 profiles
Layout of the fixed-length format for Japanese