Speech recognition in consumer electronics generally takes one of two forms: speaker-independent or speaker-dependent recognition.

Devices utilizing the speaker-independent type can be used by any consumer, "right out of the box", so to speak. Speaker- independent patterns are generated by taking a large number of samples across a target demographic for the device (young girls, yuppies, or even dogs!) and then averaging the patterns to create a somewhat "universal" template. Obviously, speaker-independent devices have a somewhat limited vocabulary due to storage constraints and the time and resources required to get a good representative sample. Usually, recordings of about 500 different people saying the same word are required to produce a worthwhile speaker-independent sample.

Speaker-dependent recognition has the advantage of versatility of vocabulary. For example, a "password" journal utilizing speech recognition ought to have the ability to learn any particular password that the consumer wants to use. Speaker dependent templates are generally compressed on the fly and stored in flash memory. Speaker-dependent technology is ideal for applications where security is of primary concern, such as the aforementioned password journal. A journal might open when Little Susie says the password, "bacon fat", but if her brother Billy tries to say "bacon fat" into the microphone, the journal won't open--unless, of course, he is adept enough at impersonating his sister!

For both speaker-dependent and speaker-independent devices, a "threshold" is often coded into the firmware. This threshold accounts for such variables as ambient room noise and variations in the human voice (due to such things as colds). A speech recognition device is said to have robust performance if it is able to pick out the pattern it is looking for despite the presence of noise.