LL parser

It parses the input from Left to right, performing Leftmost derivation of the sentence.

The class of grammars parsable by the LL(*) strategy encompasses some context-sensitive languages due to the use of syntactic and semantic predicates and has not been identified.

[7] Against the popular misconception, LL(*) parsers are not LLR in general, and are guaranteed by construction to perform worse on average (super-linear against linear time) and far worse in the worst-case (exponential against linear time).

According to Waite and Goos (1984),[9] LL(k) grammars were introduced by Stearns and Lewis (1969).

[10] For a given context-free grammar, the parser attempts to find the leftmost derivation.

is: Generally, there are multiple possibilities when selecting a rule to expand the leftmost non-terminal.

, if and only if for any two leftmost derivations: the following condition holds: the prefix of the string

parser is a deterministic pushdown automaton with the ability to peek on the next

This peek capability can be emulated by storing the lookahead buffer contents in the finite state space, since both buffer and input alphabet are finite in size.

, where: The parser stack initially contains the starting symbol above the EOI:

The states and the transition function are not explicitly given; they are specified (generated) using a more convenient parse table instead.

The table provides the following mapping: If the parser cannot perform a valid transition, the input is rejected (empty cells).

To make the table more compact, only the non-terminal rows are commonly displayed, since the action is the same for terminals.

To explain an LL(1) parser's workings we will consider the following small LL(1) grammar: and parse the following input: An LL(1) parsing table for a grammar has a row for each of the non-terminals and a column for each terminal (including the special terminal, represented here as $, that is used to indicate the end of the input stream).

Each cell of the table may point to at most one rule of the grammar (identified by its number).

The parsing table instruction comes from the column headed by the input symbol '(' and the row headed by the stack-top symbol 'S'; this cell contains '2', which instructs the parser to apply rule (2).

Because they are the same, it removes it from the input stream and pops it from the top of the stack.

In this case the parser will report that it has accepted the input string and write the following list of rule numbers to the output stream: This is indeed a list of rules for a leftmost derivation of the input string, which is: Below follows a C++ implementation of a table-based LL parser for the example language: As can be seen from the example, the parser performs three types of steps depending on whether the top of the stack is a nonterminal, a terminal or the special symbol $: These steps are repeated until the parser stops, and then it will have either completely parsed the input and written a leftmost derivation to the output stream or it will have reported an error.

In order to fill the parsing table, we have to establish what grammar rule the parser should choose if it sees a nonterminal A on the top of its stack and a symbol a on its input stream.

It is easy to see that such a rule should be of the form A → w and that the language corresponding to w should have at least one string starting with a.

For this purpose we define the First-set of w, written here as Fi(w), as the set of terminals that can be found at the start of some string in w, plus ε if the empty string also belongs to w. Given a grammar with the rules A1 → w1, …, An → wn, we can compute the Fi(wi) and Fi(Ai) for every rule as follows: The result is the least fixed point solution to the following system: where, for sets of words U and V, the truncated product is defined by

This is because a right-hand side w of a rule might ultimately be rewritten to the empty string.

So the parser should also use the rule A → w if ε is in Fi(w) and it sees on the input stream a symbol that could follow A.

We use $ as a special terminal indicating end of input stream, and S as start symbol.

If T[A, a] denotes the entry in the table for nonterminal A and terminal a, then Equivalently: T[A, a] contains the rule A → w for each a ∈ Fi(w)·Fo(A).

Until the mid-1990s, it was widely believed that LL(k) parsing[clarify] (for k > 1) was impractical,[11]: 263–265 since the parser table would have exponential size in k in the worst case.

This perception changed gradually after the release of the Purdue Compiler Construction Tool Set around 1992, when it was demonstrated that many programming languages can be parsed efficiently by an LL(k) parser without triggering the worst-case behavior of the parser.

FOLLOW(A) is the union over:[12] There are two main types of LL(1) conflicts: The FIRST sets of two different grammar rules for the same non-terminal intersect.

With an empty string (ε) in the FIRST set, it is unknown which alternative to select.

[13] For a general method, see removing left recursion.