KPS 9566

Subsequent editions have added additional encoded characters outside of the 94×94 plane, in a manner comparable to UHC or GBK.

[3] KPS 9566 differs in approach from KS X 1001, its South Korean counterpart, in using a different ordering of Chosŏn'gŭl,[4] in encoding explicit vertical presentation forms of punctuation, in not encoding duplicate Hanja for multiple readings, and in including several characters specific to the North Korean political system, including special encodings for the names of the country's past and present leaders (Kim Il Sung, Kim Jong Il and Kim Jong Un).

[10] ASCII was a 7-bit, single-byte encoding including 94 graphical characters, the space, and 33 control codes, which provided basic support for representing American English text as a series of bytes.

[11] ISO 2022 specifies mechanisms for using single-byte and multiple-byte character sets with a certain structure in both 7-bit and 8-bit environments, and for declaring and switching between them in a standard fashion using shift codes and escape sequences.

[19][20] Wansung code did not encode all possible modern Korean syllables, only a selection of the 2350 most common,[2] although it allowed them to be specified using combining sequences, which often were not supported.

[16] South Korea was not the only country developing an ISO 2022 DBCS for Korean: the Mainland Chinese GB 12052 was published in 1989.

The current edition as of the release of Red Star OS 3.0 appears to be KPS 9566-2011, which adds Kim Jong Un to the list of leaders.

[23] The more recent editions, from what sources of information are available outside of North Korea itself, appear to define additional allocations outside of the EUC plane (similarly to GBK or UHC).

[26][27] In principle, KPS 9566 is similar to the Wansung character set defined by the South Korean KS X 1001 standard, although the two are not compatible.

Specifically, it includes the hammer, sickle and brush emblem of the Workers' Party of Korea, both uncircled and circled[7] (code points 12-01 and 12-02),[23] and two groups of three special-purpose characters which spell out the names of the North Korean leaders Kim Il Sung (김일성) and Kim Jong Il (김정일) in a special decorative font (code points 04-72 to 04-74 and 04-75 to 04-77, respectively).

[39] A detailed response was submitted by the Swedish representative in March 2000, opposing several of the points and elaborating on Sweden's vote against the proposal.

This response stated that changing the encoding of the Korean characters again would cause major disruption, even more so than the first time, which was done when comparatively few implementations existed, but which in retrospect should not have been done.

It suggested that a machine-readable mapping file between Unicode and KPS 9566 could be provided by the North Korean body itself, and would be more useful than a printed cross-reference in the standard document.

[4] In August 2000, the North Korean national body submitted a more detailed version of their requests in a series of five consecutive proposals.

[46][47] In this version of the proposal, a section of document excerpts demonstrating use of several characters and short explanations of their purpose was included.

[50] In November 2002, the South Korean body published a set of three-way tables mapping characters between the KPS 9566, KS X 1001 (as EUC-KR) and ISO/IEC 10646 standards as they existed in 2000.

[22] These files mapped the characters unavailable in Unicode to the Private Use Area, and included additional encoded forms for other syllable blocks outside of the main ISO-IR-202 plane.

Several of these additional symbols are also mapped to the Private Use Area; however, their identity is not known, since no names or reference glyphs for those characters are known outside of North Korea.

[53] Of these characters, the hot beverage, umbrella with raindrops, lightning bolt and warning triangle, and the upward, downward and leftward arrows were subsequently selected as mappings from the Japanese cellular emoji sets,[59] making a total of seven current Unicode emoji which were originally added to Unicode at the request of North Korea.

These include the WPK symbol, four triangular marks, a leftward-pointing pair of scissors (excluded on the rationale that contrastive use with the rightward scissors in the Dingbats block had not been demonstrated), an upward-pointing manicule in a circle, vertical presentation forms of punctuation marks, variants of closing brackets incorporating full stops, horizontal-barred variants of vulgar fractions encoded separately from their slanted versions, and the leaders' names.

[65] A Japanese postal mark with a downward pointing triangle was included in KPS 9566-97 but removed in KPS 9566-2003[1] after the North Korean body had withdrawn it from their Unicode proposal for review[66] in response to requests from the South Korean body for evidence of the symbol's use in North Korea.

[3] The 2011 edition also includes several additional Hanja and symbols encoded outside of the ISO-IR-202 plane, after the range used for the extended syllable blocks.

[17] The extended UHC-style 8-bit encodings defined by the 2003 edition onwards likewise use the larger byte values, between 0xA1 and 0xFE inclusive, for the main ISO-IR-202-based plane.

[1][3] This set contains common sentence punctuation such as brackets, quotation marks, commas and so forth, as well as presentation forms for use in vertical writing.

[46] This set includes a subset of ASCII, minus punctuation and symbols, comprising western Arabic numerals and both cases of the Basic Latin alphabet.

Several circled numbers in this row were mapped to Unicode incorrectly in the 2003 edition, due to using non-final proposed code points.

[67] Certain KPS 9566 characters in this row, namely two forms of the emblem of the Workers' Party of Korea, a pair of scissors pointing in a different direction to those in the Dingbats block, and a circled upward-pointing manicule, remain mapped to the Private Use Area.

[1] They constitute a subset of the Latin-1 Supplement block of Unicode (equivalent to the upper half of the ISO 8859-1 (Latin-1) character set).

The Hanja at 69-09 (0xE5A9) is mapped to U+676E 杮 in all documented tables; characters are, however ordered according to their readings, from which it appears that it is intended to be U+67FF 柿 instead.

[1] This set contains several punctuation marks used in Japan, and some characters from the Hangul Compatibility Jamo Unicode block which are not already included in row 4.