Also known as type-0 grammars, phrase structure grammars, or unrestricted grammars, this is the largest family of formal grammars in the Chomsky hierarchy. A semi-Thue (by the way, it's pronounced TOO-ay) grammar G is a four-tuple (V, T, P, S), where V is a set of variables, T is a set of terminal symbols, where V and T are disjoint (actually, this condition is not strictly necessary, heck, there is no real distinction between a variable and a terminal in a semi-Thue grammar, we just keep it this way for harmony with other types of grammars). P is a set of productions α -> β, where α and β are in (V ∪ T)*, but α is not empty. S is an element of V that is considered the start symbol. These grammars are called semi-Thue because the productions are not necessarily reversible: a Thue grammar would require that all productions be reversible, i.e. α -> β would also require β -> α.

We say that γαδ => γβδ whenever there is a production α -> β. The formal language accepted by the semi-Thue grammar G is thus:

L(G) = {w|w e T* and S *=> w},

(where *=> is the reflexive and transitive closure of =>), just as for a context-free grammar.

It can be shown that semi-Thue grammars characterize the recursively enumerable languages. If, for instance a formal language L is L(G) for some semi-Thue grammar G = (V, T, P, S), it can be shown that L is a recursively enumerable language. This is the same as saying that there exists a Turing machine that accepts the same language as G. That Turing machine is simple enough to construct, as a two-tape nondeterministic machine. The first tape contains the input word w to be tested. The second tape is used to produce sentential forms from G. To begin, the Turing machine initializes the second tape with S. Then M will repeatedly do the following:

  1. Start at the left of the second tape and repeatedly choose to move right or select the current position on the tape.
  2. Nondeterministically choose a production β -> γ in G
  3. If β appears at some position in the second tape, replace β by γ there, possibly shifting the following symbols left or right depending on the relative lengths of β and γ (e.g. if γ is shorter than &beta, shift the tape symbols left).
  4. Compare the resulting sentential form with w on the first tape. If they match, accept, w is in L(G). If not, go back to step 1.

It is easy to see that the Turing machine will generate all of and only the sentential forms of G on its second tape after the last step is executed an arbitrary number of times. Thus the language accepted by the Turing machine is L(G), so L is recursively enumerable.

The converse of the above is also true: If L is a recursively enumerable language, then L is the language accepted by some semi-Thue grammar G. The proof is somewhat more involved, but the basic idea is we create the grammar by generating two copies of a representation of some word made up of symbols from the tape alphabet of the Turing machine M that accepts L. Then simulate M on this word. If M accepts the word, then G converts the second copy into a terminal string. If M doesn't accept it, then the derivation doesn't result in a terminal string.

These grammars are named for Axel Thue, but they were also apparently independently discovered by Noam Chomsky, who used the name phrase structure grammars for them. It is of course possible to create a whole programming language around this concept, as it is certainly possible to write a semi-Thue grammar that can be used to simulate a universal Turing Machine, but as far as I know no one has created anything like a practical general-purpose language out of it.


J. Hopcroft and J. Ullman, Introduction to Automata Theory, Languages, and Computation, (yes, the Cinderella Book)

Thue Reference Manual at

R. McNaughton, "Semi-Thue Systems With an Inhibitor",