UTF-32

This makes UTF-32 a simple replacement in code that uses integers that are incremented by one to examine each location in a string, as was commonly done for ASCII.

However, Unicode code points are rarely processed in complete isolation, such as combining character sequences and for emoji.

[2] The original ISO/IEC 10646 standard defines a 32-bit encoding form called UCS-4, in which each code point in the Universal Character Set (UCS) is represented by a 31-bit value from 0 to 0x7FFFFFFF (the sign bit was unused and zero).

In November 2003, Unicode was restricted by RFC 3629 to match the constraints of the UTF-16 encoding: explicitly prohibiting code points greater than U+10FFFF (and also the high and low surrogates U+D800 through U+DFFF).

For instance, in modern text rendering, it is common[citation needed] that the last step is to build a list of structures each containing coordinates (x, y), attributes, and a single UTF-32 code point identifying the glyph to draw.

On Unix systems, UTF-32 strings are sometimes, but rarely, used internally by applications, due to the type wchar_t being defined as 32-bit.

[8][9] Python versions up to 3.2 can be compiled to use them[clarification needed] instead of UTF-16; from version 3.3 onward, Unicode strings are stored in UTF-32 if there is at least 1 non-BMP character in the string, but with leading zero bytes optimized away "depending on the [code point] with the largest Unicode ordinal (1, 2, or 4 bytes)" to make all code points that size.