Shift-reduce parser

A shift-reduce parser is a class of efficient, table-driven bottom-up parsing methods for computer languages and other notations formally defined by a grammar.

All shift-reduce parsers have similar outward effects, in the incremental order in which they build a parse tree or call specific output actions.

The parser builds up the parse tree incrementally, bottom up, and left to right, without guessing or backtracking.

At every point in this pass, the parser has accumulated a list of subtrees or phrases of the input text that have been already parsed.

The action is read from a table containing all syntactically valid combinations of stack and lookahead symbols.

It doesn't cover all language rules, such as the size of numbers, or the consistent use of names and their definitions in the context of the whole program.

Grammars for complete languages use recursive rules to handle lists, parenthesized expressions and nested statements.

The parser's program code is a simple generic loop that applies unchanged to many grammars and languages.

For LR methods, the complex tables are mechanically derived from a grammar by some parser generator tool like Bison.

Its total execution time scales linearly with the length of the input and the size of the complete parse tree.

Reductions reorganize the most recently parsed things, that is, those immediately to the left of the lookahead symbol.

The base or bottom of the stack is on the left and holds the leftmost, oldest parse fragment.

When a grammar rule such as is applied, the stack top holds the parse trees "... Products * Value".

[4] The parser keeps applying reductions to the top of the parse stack for as long as it keeps finding newly completed examples of grammar rules there.

In the example above, int and id belong to inner grammar levels compared to the statement delimiter ;.

There are different varieties of precedence parsers, each with different ways of finding the handle's left end and choosing the correct rule to apply:

[9][10] Recent research has identified methods by which canonical LR parsers may be implemented with dramatically reduced table requirements over Knuth's table-building algorithm.

Given a specific stack state and lookahead symbol, there are precisely four possible actions, ERROR, SHIFT, REDUCE, and STOP (hereinafter referred to as configurations).

For practical reasons, including higher performance, the tables are usually extended by a somewhat large, auxiliary array of two-bit symbols, obviously compressed into four two-bit symbols, a byte, for efficient accessing on byte-oriented machines, often encoded as: (STOP being a special case of SHIFT).

In programming systems which support the specification of values in quaternary numeral system (base 4, two bits per quaternary digit), such as XPL, these are coded as, for example: The SHIFT and the REDUCE tables are implemented separately from the array.

This presents an opportunity to invoke an error recovery procedure, perhaps, in its most simplistic form, to discard the lookahead terminal symbol and to read the next terminal symbol, but many other programmed actions are possible, including pruning the stack, or discarding the lookahead terminal symbol and pruning the stack (and in a pathological case, it is usually possible to obtain where consists only of a "null statement").