The description above suggests that finite state machines are always hardware devices. But they are a mathematical specification of systems behaving in a certain way; such systems can be found not only in digital circuits, but just as well in software, and in the natural world around us.

There is a lot of (mathematical) theory on them. Some of this theory is essential to many courses in computing science, and usually calls them finite state automata.

What this theory considers is the use of these machines as language acceptors. That is, we disregard output, control signals, and clocks entirely, regard one state as the initial state, and another as the final state. We then consider the set of possible input sequences that bring the machine from initial to final state, calling this set the language associated with the machine.

We then consider the question: which languages can be described in this way?

Why is this so important to the construction of software systems? Well, many software systems operate on a finite amount of input. The input must be presented in some kind of structured format, to be read and interpreted by the program. The structured format is the "language". The question really asks whether a task, in order to be performed by a program, requires an input format that can be scanned and interpreted using no more than a fixed amount of memory. If it can, we not only know something about the program - namely, that it is, essentially, a finite state machine - but also something about the input format required - namely, that it is a so-called regular language.

Finite state machines are also used very commonly as a method to describe discrete processes, for instance, the progression of customers through a store. The statecharts formalism, now part of UML, is based on them. Another very common application is to attach probabilities to the state transitions and use them to estimate long-term properties of the system; the most common way to do this is Markov modelling and calls the finite state machines Markov chains.