Chomsky Normal Form (

CNF) is a restriction on the form of

context-free grammars: the

right hand side of every

rule must consist of either one

terminal or two

nonterminals.

It is indeed a normal form: algorithms exist to convert any context-free grammar into an equivalent grammar in CNF. But it is not possible to speak of "the Chomsky Normal Form of a context-free grammar": different normalization algorithms produce different results.

Also, many different grammars can produce the same normal form.
(For these reasons, it is incorrect to say that a CNF grammar "describes" a context-free grammar, and I've /msged PeterPan asking to correct this.)

There is one caveat: grammars in CNF cannot generate the empty string, so they can only represent arbitrary grammars up to the empty string.

An obvious normalization algorithm:

- recursively expand all empty-generating rules
- for each terminal, introduce a nonterminal to generate it, and replace all terminals occurring as part of existing right hand sides with those nonterminals
- recursively replace all rules A -> BX, where A, B are nonterminals and X is a string of 2 or more nonterminals, with the rules A -> BC and C -> X, where C is a new nonterminal

For example: in grammar

```
S -> a
S -> aAS
A -> a
A -> b
A -> SS
A ->
```

we eliminate the empty rules (all one of them) to get

```
S -> a
S -> aAS
S -> aS
A -> a
A -> b
A -> SS
```

then eliminate non-isolated terminals (all one of them) to get

```
S -> a
S -> CAS
S -> CS
A -> a
A -> b
A -> SS
C -> a
```

and finally "chain" the long right hand sides (all one of them) to get a grammar in CNF:

```
S -> a
S -> CD
S -> CS
A -> a
A -> b
A -> SS
C -> a
D -> AS
```