Mechanical speech synthesis

Everybody is familiar with the sound of digital speech synthesis, those half-creepy monotone voices reading us our weather reports and bus schedules. Perhaps a few of us have even heard analog speech synthesis, done with formant modeling on some closet-sized Moog or k-rad Cubase plugin. There is another type of speech synthesis, played on what amounts to a musical instrument that produces speech rather than music. Such mechanical speech synthesis is orders of magnitude more rare than even analog synthesis, to the point that only a handful of people on earth today have heard it, and not that many more know it ever existed. This has not always been so...

One researcher, David Lindsay, estimates that experiments with mechanical speech were going on as early as the thirteenth century. He's found evidence that Albertus Magnus, once an instructor of St. Thomas Aquinas, created an automated head that could produce speech. As the story has it, this work was later destroyed by Aquinas as an abomination. At around the same time, the great optician and alchemist Roger Bacon is said to have produced one as well. Whether or not these were fakes (or even existed, for that matter), there was interest in speaking devices up to and through the Renaissance, and beyond. Cervantes' Don Quixote mentions one as a common sight in the very late 1500's, faked by a tube from the machine's mouth to an agent under the floorboards.

The first modern schematic and description for a working speech machine was published in 1791, by one Wolfgang von Kempelen of Hungary. As first published, it contained a detailed instruction of the machine and its use, as well as six page-sized schematic drawings of the device's mechanical structure and function. A true scientist, von Kempelen submitted his work to the public to be studied and improved upon. Even as designed by von Kempelen the machine worked well, some operators could produce whole sentences with only three weeks of practice. Later, von Kempelen added a simulation of intonation by varying the length of the reed with a handle, though the earlier versions spoke in monotone. Other innovations were a flexible "mouth" cone for more realistic sounds, as well as mechanical teeth and lips to do the same.

Almost half of von Kempelen's machine is taken up by a large bellows, which acts as the lungs when squeezed shut by the operator. Attached to the bellows over a pulley is a counterweight, which quickly pulls the bellows full of air to simulate inhalation. Air moves from the bellows through a tube, which is fed past an ivory reed that imparted a tone probably something like a clarinet. This tube then splits two ways, one way to the "mouth" and the other to two "nostrils," all conical bells to be covered by a palm or fingers respectively. Besides this main pathway, air also feeds directly from the bellows into the mouth to allow a pressure build-up for plosive phonemes (like in pork or bat). Also, there are two noise whistles that the airflow could be routed through for voiced and unvoiced fricatives (think zap and snake), as well as a weight that can be dropped on the reed to produce a retroflex continuant (for rent). Later, von Kempelen added a simulation of intonation by varying the length of the reed with a handle, though the earlier versions spoke in monotone. Other innovations were a flexible "mouth" cone for more realistic sounds, as well as mechanical teeth and lips to do the same.

Some time between 1830 and 1840, a man named Joseph Faber came upon von Kempelen's work, and was inspired to design a speaking machine. He had taken university schooling, and was proficient in music and mathematics, but during recovery from illness had taken (against doctors' orders) to woodcarving as therapy for himself, putting him in the perfect position to build the machine himself. He named the finished machine the Euphonia, and it was able to produce a greater variety of phonemes more easily than any of its predecessors. Notably, it also had a flexible face-shaped attachment which produced different faces in accordance with what sounds were produced. The Euphonia was fed by a large bellows, which was powered by the operator's foot and pushed air through a single pipe organ pipe. Past the pipe was a sixteen stage (!) mechanical "mouth," which was operated by sixteen keys that also controlled the aforementioned face.

When skillfully operated the Euphonia could produce the sounds of any European language in a whispered voice, and with a separate keyboard for pitch variation could even sing God Save the Queen. Faber's playing of the machine in English was said to be more easily understandable than his speech, being a native Austrian. Unfortunately, for all of its innovations the machine proved to be fairly uninteresting to the crowds in Vienna, where he left for London around 1845. Finding no interest there, either, he left for the United States after a few years. He looked for investors there, and one of the men who would later help Bell invent the telephone turned him down for a loan. In 1850, frustrated with his life and his lack of achievement, Faber destroyed the machine and took his own life. His blueprints survived, and were used by a son-in-law to make a second Euphonia, but it too was met with widespread apathy.

Hyper-advanced baroque music	Roger Bacon's mechanical head	speech synthesis	performative linguistics
One should expect as much from a machine	Cubase VST	Von Kempelen and his Discovery	clockwork automata
The Futurist Intonarumori	Digital speech synthesis	Formant	Albertus Magnus
Tapedecks of the Baroque era	Phonorganon	Alexander Graham Bell	Euphonium
noder	k-rad	rVoice	Rhetorical Systems
Ryhmä-X	The Christmas Ladder