Parikh's theorem

Parikh's theorem in theoretical computer science says that if one looks only at the number of occurrences of each terminal symbol in a context-free language, without regard to their order, then the language is indistinguishable from a regular language.

[1] It is useful for deciding that strings with a given number of terminals are not accepted by a context-free grammar.

[2] It was first proved by Rohit Parikh in 1961[3] and republished in 1966.

The Parikh vector of a word is defined as the function

denotes the number of occurrences of the symbol

is said to be semi-linear if it is a union of finitely many linear subsets.

be the set of Parikh vectors of words in

is any semi-linear set, then there exists a regular language (which a fortiori is context-free) whose Parikh vectors is

of context-free languages and of regular languages is the same, and it is equal to the set of semilinear sets.

Two languages are said to be commutatively equivalent if they have the same set of Parikh vectors.

The second part is easy to prove.

, to construct a regular language whose set of Parikh vectors is

is a union of 0 or more linear sets.

Since the empty language is regular, and union of regular languages is regular, it suffices to prove that any linear set is the set of Parikh vectors of a regular language.

The following proof is credited to Goldstine.

[5] First we need a small strengthening of the pumping lemma for context-free languages: Lemma — If

is generated by a Chomsky normal form grammar, then

The proof is essentially the same as the standard pumping lemma: use the pigeonhole principle to find

copies of some nonterminal symbol

in the longest path in the shortest derivation tree.

Now we prove the first part of Parikh's theorem, making use of the above lemma.

First, construct a Chomsky normal form grammar for

For each finite nonempty subset of nonterminals

such that there exists a derivation that uses every nonterminal in

We induct on the minimal number of factors from

Ginsburg and Spanier [6] gave a necessary and sufficient condition, similar to Parikh's theorem, for bounded languages.

Call a linear set stratified, if in its definition for each

A semi-linear set is stratified if it is a union of finitely many stratified linear subsets.

It shows that a context-free language over a singleton alphabet must be a regular language and that some context-free languages can only have ambiguous grammars[further explanation needed].

From a formal grammar perspective, this means that some ambiguous context-free grammars cannot be converted to equivalent unambiguous context-free grammars.