TRON (encoding)

characters with equivalent semantics will be encoded more than once, complicating some operations.

Separate code points for Chinese, Korean, and Japanese variants of the 70,000+ Han characters in Unicode 4.1 (if that were deemed necessary) would require more than 200,000 code points in TRON.

Alternatively, the notation "0xNNYYYY" can be used, where "NN" is the second byte in hexadecimal of the language specifier code.

A text format "&TNNYYYY;" can be used to denote a TRON code point in ASCII text, in a similar manner to numeric character references in HTML, SGML or XML.

However, a standard and conforming HTML or XML parser would treat them as named entities, that can't be directly and easily mapped to valid and unambiguous sequences of code points in the UCS, without an extensive DTD to define them (possibly by using some private use characters for TRON escapes, or Unicode variation selectors mapped to TRON characters for encoding different TRON characters represented as the same character in the UCS): a different SGML-based parser will be needed to support the TRON text format in a way interoperable with standard UTF's for the UCS.