Ambiguous grammar

In computer science, an ambiguous grammar is a context-free grammar for which there exists a string that can have more than one leftmost derivation or parse tree.

[1][2] Every non-empty context-free language admits an ambiguous grammar by introducing e.g. a duplicate rule.

For computer programming languages, the reference grammar is often ambiguous, due to issues such as the dangling else problem.

If present, these ambiguities are generally resolved by adding precedence rules or other context-sensitive parsing rules, so the overall phrase grammar is unambiguous.

[4] The simplest example is the following ambiguous grammar (with start symbol A) for the trivial language that consists of only the empty string: …meaning that the nonterminal A can be derived to either itself again, or to the empty string.

This language also has an unambiguous grammar, consisting of a single production rule: …meaning that the unique production can produce only the empty string, which is the unique string in the language.

In the same way, any grammar for a non-empty language can be made ambiguous by adding duplicates.

The regular language of unary strings of a given character, say 'a' (the regular expression a*), has the unambiguous grammar: …but also has the ambiguous grammar: These correspond to producing a right-associative tree (for the unambiguous grammar) or allowing both left- and right- association.

The context free grammar is ambiguous since there are two leftmost derivations for the string a + a + a: As another example, the grammar is ambiguous since there are two parse trees for the string a + a − a: The language that it generates, however, is not inherently ambiguous; the following is a non-ambiguous grammar generating the same language: A common example of ambiguity in computer programming languages is the dangling else problem.

In many languages, the else in an If–then(–else) statement is optional, which results in nested conditionals having multiple ways of being recognized in terms of the context-free grammar.

In a grammar containing the rules[a] some ambiguous phrase structures can appear.

Sometimes the grammar is modified so that it is unambiguous, such as by requiring an endif statement or making else mandatory.

In other cases the grammar is left ambiguous, but the ambiguity is resolved by making the overall phrase grammar context-sensitive, such as by associating an else with the nearest if.

The decision problem of whether an arbitrary grammar is ambiguous is undecidable because it can be shown that it is equivalent to the Post correspondence problem.

[5] At least, there are tools implementing some semi-decision procedure for detecting ambiguity of context-free grammars.

[6] The efficiency of parsing a context-free grammar is determined by the automaton that accepts it.

Deterministic context-free grammars are accepted by deterministic pushdown automata and can be parsed in linear time, for example by an LR parser.

[7] They are a strict subset of the context-free grammars, which are accepted by pushdown automata and can be parsed in polynomial time, for example by the CYK algorithm.

For example, the language of even-length palindromes on the alphabet of 0 and 1 has the unambiguous context-free grammar S → 0S0 | 1S1 | ε.

An arbitrary string of this language cannot be parsed without reading all its symbols first, which means that a pushdown automaton has to try alternative state transitions to accommodate for the different possible lengths of a semi-parsed string.

Compiler generators such as YACC include features for resolving some kinds of ambiguity, such as by using the precedence and associativity constraints.

While some context-free languages (the set of strings that can be generated by a grammar) have both ambiguous and unambiguous grammars, there exist context-free languages for which no unambiguous context-free grammar can exist.

[9][10] The existence of inherently ambiguous context-free languages was proven with Parikh's theorem in 1961 by Rohit Parikh in an MIT research report.

[12] Ogden's lemma[13] can be used to prove that certain context-free languages, such as

But Hopcroft & Ullman (1979) give a proof that no context-free grammar for this union language can unambiguously parse strings of form

[14] More examples, and a general review of techniques for proving inherent ambiguity of context-free languages, are found given by Bassino and Nicaud (2011).