Percent-encoding

For example, forward slash characters are used to separate different parts of a URL (or, more generally, a URI).

Percent-encoding a reserved character involves converting the character to its corresponding byte value in ASCII and then representing that value as a pair of hexadecimal digits (if there is a single hex digit, a leading zero is added).

This determination is dependent upon the rules established for reserved characters by individual URI schemes.

In the World Wide Web's formative years, when dealing with data characters in the ASCII repertoire and using their corresponding bytes in ASCII as the basis for determining percent-encoded sequences, this practice was relatively harmless; it was just assumed that characters and bytes mapped one-to-one and were interchangeable.

Web applications consequently began using different multi-byte, stateful, and other non-ASCII-compatible encodings as the basis for percent-encoding, leading to ambiguities and difficulty interpreting URIs reliably.

Arbitrary character data is sometimes percent-encoded and used in non-URI situations, such as for password-obfuscation programs or other system-specific translation protocols.

Presumably, it is up to the URI scheme specifications to account for this possibility and require one or the other, but in practice, few, if any, actually do.

There exists a non-standard encoding for Unicode characters: %uxxxx, where xxxx is a UTF-16 code unit represented as four hexadecimal digits.

[4] The encoding used by default is based on an early version of the general URI percent-encoding rules,[5] with a number of modifications such as newline normalization and replacing spaces with + instead of %20.

The media type of data encoded this way is application/x-www-form-urlencoded, and it is currently defined in the HTML and XForms specifications.

In addition, the CGI specification contains rules for how web servers decode data of this type and make it available to applications.