Within the Chomsky hierarchy
of languages you have:
The regular language is recognized by a regular expression
or finite state automaton
(they are ultimately the same thing). A context free
language is recognized by a pushdown automaton which is a finite state automaton with a stack. In this respect, these two machines are similar. Likewise, the machines that recognize the context sensitive and unrestricted languages are similar - the unrestricted language is recognized by a Turing Machine
with an infinite tape (the classic machine). A context sensitive language is recognized by a Turing Machine with a tape of a fixed length and is known as a linear-bounded automaton.
The linear-bounded automaton (lba) is a Turing Machine with a tape of some finite length. The length is a function of the length of the initial input string and a constant for that particular machine. Some definitions of the lba have it that this constant is 1 for all cases and compensating for a shorter tape by having a larger alphabet. A different definition is "a Turing Machine where the amount of tape needed can be predicted and is a linear function of the length of the input."
Consider a context sensitive language with the input alphabet to be (0,1). If the machine needs more memory (say 3x) to recognize this language than the input string, this can be accommodated by having the usable tape be longer (3x), or the alphabet could be extended (0,1,2,3,4,5). This can be thought of as:
original string: '010101'
more memory: '010101____________'
more memory: '010101'
(as multiple '______'
One can easily imagine encoding the multiple tapes as a single value with a larger alphabet.
So, what is a context-sensitive language? In a context-free language, rules are of the form A -> BC or A -> x. There may be only one non-terminal (often shown as an uppercase letter) and no terminals (often shown as a lower case letter) on the left hand side of the transformation. This is not the case with a context-sensitive language where more than one terminator or non-terminal token may appear on the left hand side. However, a terminal character may not change once it has been written ('aA -> bB' is not valid).
The classic example of a context-sensitive language is
which describes a language with a run of 'a' followed by a run of 'b' followed by a run of 'c' where each run is the same length. This language is context sensitive - and not context-free (this can be proved using the pumping lemma)
A grammar for this language is:
S -> aBC
S -> SABC
CA -> AC
BA -> AB
CB -> BC
aA -> aa
aB -> ab
bB -> bb
bC -> bc
cC -> cc
And from this, we get:
So, what does a linear-bound automaton look like? The best way to draw this is as a chart for a turing machine.
Given the tape '<aabbcc>' where the '<' represents the start of the tape and the '>' represents the end of the tape, the following lba recognizes the language
< a b c > A B C
0 <R0 AR1 XXX XXX +++ AR0 BR5 XXX
1 XXX aR1 BR2 XXX XXX XXX BR1 XXX
2 XXX XXX bR2 CL3 XXX XXX XXX CR2
3 XXX XXX XXX cL3 >L4 XXX XXX XXX
4 <R0 aL4 bL4 cl4 XXX AL4 BL4 CL4
5 XXX XXX XXX XXX XXX XXX BR5 CR6
6 XXX XXX XXX XXX +++ XXX XXX CR6
Ok, so thats a big block of text that represents a Turing Machine with 7 states. It works by starting off in state 0. Each set of 3 characters represents:
- What to write back to the tape
- Which direction to move the tape head
- What the next state is
The combination of 'XXX' means: halt, don't accept
The combination of '+++' means: halt, accepted
Thus state 0 can be read as:
Read 'A's (writing 'A' and moving to the right) until you hit an 'a' (write an 'A' and change to state 1 and move to the right), or a 'B' (write a 'B', change to state 5 and move to the right).
It is fairly easy to recognize that the tape never goes further than the end of the tape (shown as a '>') and thus making this Turing Machine linearly bounded.
It is also possible to draw this out in a state table similar to that found in finite state automaton.
| | | |
v | v |
| 1 | -- bBR --> | 2 | -- cCR --> ...
^ | ^ |
| | | |
The above is an excerpt for states 1 and 2 Each transition has three parts again, this time the style is:
- Match character
- Write character
- Read head move
It should be clear that for the more complex states with several loops back to them, that rapidly grows too difficult for the meager abilities of ASCII art
Something of some intrest to people is that the halting problem is solveable for linear bounded automata. This is because there is a finite number of possible states that an lba can be in. This value is k*n*(s^n) - where s is the size of the alphabet, n is the length of the tape, and k is the number of states. One can then observe an lba for this number of iterations. If the lba has not halted by that time, it must be in a loop and will never halt.