Punycode

Punycode is a representation of Unicode with the limited ASCII character subset used for Internet hostnames.

The Punycode syntax is a method of encoding strings containing Unicode characters, such as internationalized domain names (IDNA), into the LDH subset of ASCII favored by DNS.

[1] The RFC author, Adam Costello, is reported to have written: Why “Punycode”?

[2] As stated in RFC 3492, "Punycode is an instance of a more general algorithm called Bootstring, which allows strings composed from a small set of 'basic' code points to uniquely represent any string of code points drawn from a larger set."

Punycode defines parameters for the general Bootstring algorithm to match the characteristics of Unicode text.

This section demonstrates the procedure for Punycode encoding, using as an example the German string "bücher" (English: books), which is translated into the label "bcher-kva".

Punycode is designed to work across all scripts, and to be self-optimizing by attempting to adapt to the character set ranges within the string as it operates.

Note that for DNS use, the domain name string is assumed to have been normalized using nameprep and (for top-level domains) filtered against an officially registered language table before being punycoded, and that the DNS protocol sets limits on the acceptable lengths of the output Punycode string.

The threshold value depends on the position in the number and also on previous insertions, to increase efficiency.

To decode this string of symbols, a sequence of thresholds will be needed, in this case it's (1, 1, 26, 26, ...).

The thresholds themselves are determined for each successive encoded character by an algorithm keeping them between 1 and 26 inclusive.

The following table shows examples of Punycode encodings for different types of input.