Context-free grammar

Nonterminal symbols are used during the derivation process, but do not appear in its final result string.

In a newer application, they are used in an essential part of the Extensible Markup Language (XML) called the document type definition.

In computer science, a popular notation for context-free grammars is Backus–Naur form, or BNF.

Since at least the time of the ancient Indian scholar Pāṇini, linguists have described the grammars of languages in terms of their block structure, and described how sentences are recursively built up from smaller phrases, and eventually individual words or word elements.

For example, the sentence: can be logically parenthesized (with the logical metasymbols [ ]) as follows: A context-free grammar provides a simple and mathematically precise mechanism for describing the methods by which phrases in some natural language are built from smaller blocks, capturing the "block structure" of sentences in a natural way.

Important features of natural language syntax such as agreement and reference are not part of the context-free grammar, but the basic recursive structure of sentences, the way in which clauses nest inside other clauses, and the way in which lists of adjectives and adverbs are swallowed by nouns and verbs, is described exactly.

Formal constraints not captured by the grammar are then considered to be part of the "semantics" of the language.

is a string of variables and/or terminals; rather than using ordered pair notation, production rules are usually written using an arrow operator with

It is allowed for β to be the empty string, and in this case it is customary to denote it by ε.

If the productions are added, a context-free grammar for the set of all palindromes over the alphabet { a, b } is obtained.

[8] The canonical example of a context-free grammar is parenthesis matching, which is representative of the general case.

[9] A second canonical example is two different kinds of matching nested parentheses, described by the productions: with terminal symbols [ ] ( ) and nonterminal S. The following sequence can be derived in that grammar: In a context-free grammar, we can pair up characters the way we do with brackets.

In contrast to well-formed nested parentheses and square brackets in the previous section, there is no context-free grammar for generating all sequences of two different types of parentheses, each separately balanced disregarding the other, where the two types need not nest inside one another, for example: or The fact that this language is not context free can be proven using pumping lemma for context-free languages and a proof by contradiction, observing that all words of the form

A derivation is fully determined by giving, for each step: For clarity, the intermediate string is usually given as well.

In this case the presented leftmost and the rightmost derivations define the same parse tree; however, there is another rightmost derivation of the same string which defines a string with a different structure and a different parse tree:

The parsing problem, checking whether a given word belongs to the language given by a context-free grammar, is decidable, using one of the general-purpose parsing algorithms: Context-free parsing for Chomsky normal form grammars was shown by Leslie G. Valiant to be reducible to Boolean matrix multiplication, thus inheriting its complexity upper bound of O(n2.3728639).

[16][17][b] Conversely, Lillian Lee has shown O(n3−ε) Boolean matrix multiplication to be reducible to O(n3−3ε) CFG parsing, thus establishing some kind of lower bound for the latter.

Algorithms are known to eliminate from a given grammar, without changing its generated language, In particular, an alternative containing a useless nonterminal symbol can be deleted from the right-hand side of a rule.

Hence, omitting the last three rules does not change the language generated by the grammar, nor does omitting the alternatives "| Cc | Ee" from the right-hand side of the rule for S. A context-free grammar is said to be proper if it has neither useless symbols nor ε-productions nor cycles.

[25] Combining the above algorithms, every context-free grammar not generating ε can be transformed into a weakly equivalent proper one.

Given a CFG, does it generate the language of all strings over the alphabet of terminal symbols used in its rules?

) will represent the top of the solution for the PCP instance while the right side will be the bottom in reverse.

An obvious way to extend the context-free grammar formalism is to allow nonterminals to have arguments, the values of which are passed along within the rules.

E.g. we can now easily express that in English sentences, the subject and verb must agree in number.

[34] Another extension is to allow additional terminal symbols to appear at the left-hand side of rules, constraining their application.

Chomsky initially hoped to overcome the limitations of context-free grammars by adding transformation rules.

[4] Such rules are another standard device in traditional linguistics; e.g. passivization in English.

Much of generative grammar has been devoted to finding ways of refining the descriptive mechanisms of phrase-structure grammar and transformation rules such that exactly the kinds of things can be expressed that natural language actually allows.

Chomsky's general position regarding the non-context-freeness of natural language has held up since then,[35] although his specific examples regarding the inadequacy of context-free grammars in terms of their weak generative capacity were later disproved.

[36] Gerald Gazdar and Geoffrey Pullum have argued that despite a few non-context-free constructions in natural language (such as cross-serial dependencies in Swiss German[35] and reduplication in Bambara[37]), the vast majority of forms in natural language are indeed context-free.

Simplified excerpt of the formal grammar [ 1 ] for the C programming language (left), and a derivation of a piece of C code (right) from the nonterminal symbol . Nonterminal symbols are blue and terminal symbols are red.