Chomsky normal form

To convert a grammar to Chomsky normal form, a sequence of simple transformations is applied in a certain order; this is described in most textbooks on automata theory.

[4]: 87–94 [5][6][7] The presentation here follows Hopcroft, Ullman (1979), but is adapted to use the transformation names from Lange, Leiß (2009).

[8][note 2] Each of the following transformations establishes one of the properties required for Chomsky normal form.

This does not change the grammar's produced language, and S0 will not occur on any rule's right-hand side.

To eliminate all rules of this form, first determine the set of all nonterminals that derive ε. Hopcroft and Ullman (1979) call such nonterminals nullable, and compute them as follows: Obtain an intermediate grammar by replacing each rule by all versions with some nullable Xi omitted.

{ab,aba,abaa,abab,abac,abb,abc,b,ba,baa,bab,bac,bb,bc,c}, but has no ε-rules.

Moreover, the worst-case bloat in grammar size[note 5] depends on the transformation order.

[8]: 7 The blow-up in grammar size depends on the order between DEL and BIN.

The following grammar, with start symbol Expr, describes a simplified version of the set of all syntactical valid arithmetic expressions in programming languages like C or Algol60.

In step "START" of the above conversion algorithm, just a rule S0→Expr is added to the grammar.

Only those context-free grammars which do not generate the empty string can be transformed into Chomsky reduced form.

is a terminal symbol, because Robert W. Floyd found any BNF syntax can be converted to the above one in 1961.

[11] But he withdrew this term, "since doubtless many people have independently used this simple fact in their own work, and the point is only incidental to the main considerations of Floyd's note.

"[12] While Floyd's note cites Chomsky's original 1959 article, Knuth's letter does not.