Lexical grammar

The program is written using characters that are defined by the lexical structure of the language used.

The character set is equivalent to the alphabet used by any written language.

The lexical grammar lays down the rules governing how a character sequence is divided up into subsequences of characters, each part of which represents an individual token.

[1] For instance, the lexical grammar for many programming languages specifies that a string literal starts with a " character and continues until a matching " is found (escaping makes this more complicated), that an identifier is an alphanumeric sequence (letters and digits, usually also allowing underscores, and disallowing initial digits), and that an integer literal is a sequence of digits.

[2] Regular expressions for common lexical rules follow (for example, C).