Code page

[11] Additionally, a list of the names and approximate IANA (Internet Assigned Numbers Authority) abbreviations for the installed code pages on any given Windows machine can be found in the Registry on that machine (this information is used by Microsoft programs such as Internet Explorer).

The text mode of standard (VGA-compatible) PC graphics hardware is built around using an 8-bit code page, though it is possible to use two at once with some color depth sacrifice, and up to eight may be stored in the display adapter for easy switching.

Unicode is an effort to include all characters from all currently and historically used human languages into single character enumeration (effectively one large single code page), removing the need to distinguish between different code pages when handling digitally stored text.

Some vendors, namely IBM and Microsoft, have anachronistically assigned code page numbers to Unicode encodings.

This convention allows code page numbers to be used as metadata to identify the correct decoding algorithm when encountering binary stored data.

These code pages were originally embedded directly in the text mode hardware of the graphic adapters used with the IBM PC and its clones, including the original MDA and CGA adapters whose character sets could only be changed by physically replacing a ROM chip that contained the font.

When dealing with older hardware, protocols and file formats, it is often necessary to support these code pages, but newer encoding systems, in particular Unicode, are encouraged for new designs.

They emulate several character sets, namely those ones designed to be used accordingly to ISO, such as UNIX-like operating systems.

In Microsoft operating systems, these are used as both the "OEM" and "Windows" code page for the applicable locale.

They emulate several character sets, namely those ones designed to be used accordingly to ISO,[clarification needed] such as UNIX-like operating systems.

Since the original IBM PC code page (number 437) was not really designed for international use, several partially compatible country or region specific variants emerged.

These code pages number assignments are not official neither by IBM, neither by Microsoft and almost none of them is referred as a usable character set by IANA.

List of known code page assignments (incomplete): Many older character encodings (unlike Unicode) suffer from several problems.

Some vendors add proprietary extensions to established code pages, to add or change certain code point values: for example, byte 0x5C in Shift JIS can represent either a back slash or a yen sign depending on the platform.

Finally, in order to support several languages in a program that does not use Unicode, the code page used for each string/document needs to be stored.

Browsers on non-Windows platforms would tend to show empty boxes or question marks for these characters, making the text hard to read.

[48][49] When, early in the history of personal computers, users did not find their character encoding requirements met, private or local code pages were created using terminate-and-stay-resident utilities or by re-programming BIOS EPROMs.

When more diverse character set support became available most of those code pages fell into disuse, with some exceptions such as the Kamenický or KEYBCS2 encoding for the Czech and Slovak alphabets.

In order to overcome such problems, the IBM Character Data Representation Architecture level 2 specifically reserves ranges of code page IDs for user-definable and private-use assignments.