IETF language tag

For example, the tag en stands for English; es-419 for Latin American Spanish; rm-sursilv for Romansh Sursilvan; sr-Cyrl for Serbian written in Cyrillic script; nan-Hant-TW for Min Nan Chinese using traditional Han characters, as spoken in Taiwan; yue-Hant-HK for Cantonese using traditional Han characters, as spoken in Hong Kong; and gsw-u-sd-chzh for Zürich German.

[8]󠀁 IETF language tags were first defined in RFC 1766, edited by Harald Tveit Alvestrand, published in March 1995.

RFC 4646 introduced a more structured format for language tags, added the use of ISO 15924 four-letter script codes and UN M.49 three-digit geographical region codes, and replaced the old registry of tags with a new registry of subtags.

The small number of previously defined tags that did not conform to the new structure were grandfathered in order to maintain compatibility with RFC 3066.

Optional script and region subtags are preferred to be omitted when they add no distinguishing information to a language tag.

As this dialect is spoken almost exclusively in Spain, the region subtag ES can normally be omitted.

For example, Zsye refers to emojis, Zmth to mathematical notation, Zxxx to unwritten documents and Zyyy to undetermined scripts.

Private-use subtags are not included in the Registry as they are implementation-dependent and subject to private agreements between third parties using them.

Whole tags that were registered prior to RFC 4646 and are now classified as "grandfathered" or "redundant" (depending on whether they fit the new syntax) are deprecated in favor of the corresponding ISO 639-3–based language subtag, if one exists.

BCP 47 defines a "Scope" property to identify subtags for language collections.

When this is the case, it is preferable to omit the script subtag, to improve the likelihood of successful matching.

For example, yi is preferred over yi-Hebr in most contexts, because the Hebrew script subtag is assumed for the Yiddish language.

ISO 15924 includes some codes for script variants (for example, Hans and Hant for simplified and traditional forms of Chinese characters) that are unified within Unicode and ISO/IEC 10646.

Two-letter region subtags are based on codes assigned, or "exceptionally reserved", in ISO 3166-1.

If the ISO 3166 Maintenance Agency were to reassign a code that had previously been assigned to a different country, the existing BCP 47 subtag corresponding to that code would retain its meaning, and a new region subtag based on UN M.49 would be registered for the new country.

Disagreements about language identification may extend to BCP 47 and to the core standards that inform it.

For example, some speakers of Punjabi believe that the ISO 639-3 distinction between [pan] "Panjabi" and [pnb] "Western Panjabi" is spurious (i.e. they feel the two are the same language); that sub-varieties of the Arabic script should be encoded separately in ISO 15924 (as, for example, the Fraktur and Gaelic styles of the Latin script are); and that BCP 47 should reflect these views and/or overrule the core standards with regard to them.

BCP 47 delegates this type of judgment to the core standards, and does not attempt to overrule or supersede them.

These attributes include country subdivisions, calendar and time zone data, collation order, currency, number system, and keyboard identification.