Context-free language

In formal language theory, a context-free language (CFL), also called a Chomsky type-2 language, is a language generated by a context-free grammar (CFG).

Context-free languages have many applications in programming languages, in particular, most arithmetic expressions are generated by context-free grammars.

Intrinsic properties of the language can be distinguished from extrinsic properties of a particular grammar by comparing multiple grammars that describe the language.

The set of all context-free languages is identical to the set of languages accepted by pushdown automata, which makes these languages amenable to parsing.

Further, for a given CFG, there is a direct way to produce a pushdown automaton for the grammar (and thereby the corresponding language), though going the other way (producing a grammar given an automaton) is not as direct.

, the language of all non-empty even-length strings, the entire first halves of which are a's, and the entire second halves of which are b's.

It is accepted by the pushdown automaton

An example of an inherently ambiguous CFL is the union of

But there is no way to unambiguously parse strings in the (non-context-free) subset

[1] The language of all properly matched parentheses is generated by the grammar

The context-free nature of the language makes it simple to parse with a pushdown automaton.

Determining an instance of the membership problem; i.e. given a string

is the language generated by a given grammar

Context-free recognition for Chomsky normal form grammars was shown by Leslie G. Valiant to be reducible to Boolean matrix multiplication, thus inheriting its complexity upper bound of O(n2.3728596).

[2][note 2] Conversely, Lillian Lee has shown O(n3−ε) Boolean matrix multiplication to be reducible to O(n3−3ε) CFG parsing, thus establishing some kind of lower bound for the latter.

[3] Practical uses of context-free languages require also to produce a derivation tree that exhibits the structure that the grammar associates with the given string.

The process of producing this tree is called parsing.

Known parsers have a time complexity that is cubic in the size of the string that is parsed.

Formally, the set of all context-free languages is identical to the set of languages accepted by pushdown automata (PDA).

A special subclass of context-free languages are the deterministic context-free languages which are defined as the set of languages accepted by a deterministic pushdown automaton and can be parsed by a LR(k) parser.

[4] See also parsing expression grammar as an alternative approach to grammar and parser.

The class of context-free languages is closed under the following operations.

, which can be shown to be non-context-free by the pumping lemma for context-free languages.

As a consequence, context-free languages cannot be closed under complementation, as for any languages A and B, their intersection can be expressed by union and complement:

In particular, context-free language cannot be closed under difference, since complement can be expressed by difference:

It is decidable whether such a language is finite, but not whether it contains every possible string, is regular, is unambiguous, or is equivalent to a language with a different grammar.

The following problems are undecidable for arbitrarily given context-free grammars A and B: The following problems are decidable for arbitrary context-free languages: According to Hopcroft, Motwani, Ullman (2003),[25] many of the fundamental closure and (un)decidability properties of context-free languages were shown in the 1961 paper of Bar-Hillel, Perles, and Shamir[26] The set

is a context-sensitive language, but there does not exist a context-free grammar generating this language.

[27] So there exist context-sensitive languages which are not context-free.

To prove that a given language is not context-free, one may employ the pumping lemma for context-free languages[26] or a number of other methods, such as Ogden's lemma or Parikh's theorem.