LL grammar

In formal language theory, an LL grammar is a context-free grammar that can be parsed by an LL parser, which parses the input from Left to right, and constructs a Leftmost derivation of the sentence (hence LL, compared with LR parser that constructs a rightmost derivation).

These form subsets of deterministic context-free grammars (DCFGs) and deterministic context-free languages (DCFLs), respectively.

LL grammars can alternatively be characterized as precisely those that can be parsed by a predictive parser – a recursive descent parser without backtracking – and these can be readily written by hand.

This article is about the formal properties of LL grammars; for parsing, see LL parser or recursive descent parser.

is an LL(k) grammar if there is at most one production rule

, An alternative, but equivalent, formal definition is the following:

is an LL(k) grammar if, for arbitrary derivations when the first

of the current input, the parser can identify with certainty the production rule

When rule identification is possible even without considering the past input

[5] In the formal definition of a strong LL(k) grammar, the universal quantifier for

For every LL(k) grammar, a structurally equivalent strong LL(k) grammar can be constructed.

An LL(1) grammar with symbols that have only the empty derivation may or may not be LALR(1).

[9] LL grammars cannot have rules containing left recursion.

[10] Each LL(k) grammar that is ε-free can be transformed into an equivalent LL(k) grammar in Greibach normal form (which by definition does not have rules with left recursion).

[12] A grammar G is said to be LL-regular (LLR) if there exists a regular partition of

LLR grammars are unambiguous and cannot be left-recursive.

[13] Hence the class of LLR grammars is strictly larger than the union of LL(k) for each k. It is decidable whether, given a regular partition

It is, however, not decidable whether an arbitrary grammar G is LLR.

This is due to the fact that deciding whether a grammar G generates a regular language, which would be necessary to find a regular partition for G, can be reduced to the Post correspondence problem.

Every LLR grammar is LR-regular (LRR, the corresponding[clarify] equivalent for LR(k) grammars), but there exists an LR(1) grammar that is not LLR.

Given a regular partition a Moore machine can be constructed to transduce the parsing from right to left, identifying instances of regular productions.

Once that has been done, an LL(1) parser is sufficient to handle the transduced input in linear time.

Thus, LLR parsers can handle a class of grammars strictly larger than LL(k) parsers while being equally efficient.

Despite that the theory of LLR does not have any major applications.

One possible and very plausible reason is that while there are generative algorithms for LL(k) and LR(k) parsers, the problem of generating an LLR/LRR parser is undecidable unless one has constructed a regular partition upfront.

But even the problem of constructing a suitable regular partition given grammar is undecidable.

The class of languages having an ε-free LL(1) grammar in Greibach normal form equals the class of simple deterministic languages.

[16] This language class includes the regular sets not containing ε.

[14] LL grammars, particularly LL(1) grammars, are of great practical interest, as they are easy to parse, either by LL parsers or by recursive descent parsers, and many computer languages[clarify] are designed to be LL(1) for this reason.

Languages based on grammars with a high value of k have traditionally been considered [citation needed] to be difficult to parse, although this is less true now given the availability and widespread use [citation needed] of parser generators supporting LL(k) grammars for arbitrary k.

The C grammar [ 1 ] is not LL(1): The bottom part shows a parser that has digested the tokens " int v;main(){ " and is about to choose a rule to derive the nonterminal " Stmt ". Looking only at the first lookahead token " v ", it cannot decide which of both alternatives for " Stmt " to choose, since two input continuations are possible. They can be discriminated by peeking at the second lookahead token (yellow background).