Let L be a regular language. That is, L is a set of words which is recognised by a finite state automaton (or, equivalently, by some regular expression). Consider a DFA recognising L, and suppose L is infinite. Suppose a DFA D recognising L has k states. We may write the steps D takes when recognising a word w=w1w2...wn above the letters:
This just means that after reading w1
, D is in state
Then for a word w in L of length >k, due to the pigeonhole principle, some state s must repeat itself: s=si=sj, i < j. Then D must also accept
and so on. In fact, if we write x=w1
(x and y may both be empty
), then w=xyz, and for all nonnegative integers
m, the word xym
z is also in L.
We've just proved
[The pumping] lemma:
Let L be an infinite regular language. Then there exists some k such that if w∈L has more than k letters, then w=xyz where
For all j≥0, xykz∈L.
The pumping lemma is useful for proving certain languages are not regular. This approach is usually taken in introductory computation courses (see e..g pumping lemma proof that the balanced braces language is not regular). However, in practice the Myhill / Nerode theorem is much more useful for proving that languages are not regular.
Pumping lemmas also exist for CFGs (context free grammars); this is noded below.