The Chomsky Normal Form is a notation of context-free grammars. Notation may sound a bit strange, especially if you know, what the CNF is (like rp). Indeed there isn't the CNF of a grammar, but of a language. But as you often have a grammar and want to transform it into a CNF, thinking of the CNF as a different notation may be easier. Normally the only restriction for a context-free grammar G = {N, T, P, o} (short description of what the symbols mean: N is the set of non-terminals, T the set of terminals, P is the set of derivation rules and o is the start symbol) is that for a derivation rule a→b a ∈(e.o.) N . The same grammar in CNF (the C is for Chomsky, there is also a CNF in boolean logic, where the C stands for conjunctive) has some more rules: b has to be formed by two non-terminal or one terminal symbol.
Formally this looks like:
A,B,C ∈(e.o.) N
a ∈(e.o.) T
A →(->) BC
A →(->) a
Every derivation rule in P has to look like this. Two additional facts about CNFs:
Every context-free grammar has one CNF. But the CNF is not unequivocal. IF you have for example a regular grammar with A →(->) abc ∈(->) P, this could be reformed into A →(->) Bc with B →(->) ab or into A →(->) aB with B →(->) bc (thanks to ariels for this explanation)
And the derivation tree (you form a tree from the derivation rules, where the right parts of the rules are the leafs of the left parts) of a CNF is a binary tree.
Chomsky Normal Form (CNF) is a restriction on the form of context-free grammars: the right hand side of every rule must consist of either one terminal or two nonterminals.

It is indeed a normal form: algorithms exist to convert any context-free grammar into an equivalent grammar in CNF. But it is not possible to speak of "the Chomsky Normal Form of a context-free grammar": different normalization algorithms produce different results.

Also, many different grammars can produce the same normal form. (For these reasons, it is incorrect to say that a CNF grammar "describes" a context-free grammar, and I've /msged PeterPan asking to correct this.)

There is one caveat: grammars in CNF cannot generate the empty string, so they can only represent arbitrary grammars up to the empty string.

An obvious normalization algorithm:

  1. recursively expand all empty-generating rules
  2. for each terminal, introduce a nonterminal to generate it, and replace all terminals occurring as part of existing right hand sides with those nonterminals
  3. recursively replace all rules A -> BX, where A, B are nonterminals and X is a string of 2 or more nonterminals, with the rules A -> BC and C -> X, where C is a new nonterminal

For example: in grammar


  S -> a
  S -> aAS
  A -> a
  A -> b
  A -> SS
  A ->

we eliminate the empty rules (all one of them) to get

  S -> a
  S -> aAS
  S -> aS
  A -> a
  A -> b
  A -> SS

then eliminate non-isolated terminals (all one of them) to get

  S -> a
  S -> CAS
  S -> CS
  A -> a
  A -> b
  A -> SS
  C -> a

and finally "chain" the long right hand sides (all one of them) to get a grammar in CNF:

  S -> a
  S -> CD
  S -> CS
  A -> a
  A -> b
  A -> SS
  C -> a
  D -> AS

Log in or register to write something here or to contact authors.