Google employees Natalie Hammel and Lorraine Yurshansky, who go by Nat & Lo for their series of informative Google tour videos, are at it again. This time the pair are demonstrating the recent improvements to Google's Text-To-Speech engine (TTS), which many of our readers have already experienced. Since synthesized, human-style voice functions are part of the biggest new trend in usability and gadgets, it's kind of a big deal.
If you're interested in the new voice and a brief history of Google's work on similar projects, just stop reading this and watch the video above. If you're religiously opposed to moving pictures (or you're just in a situation where you can't watch it), allow me to summarize. Essentially the newest versions of synthesized computer voices break apart the phoneme portions of thousands of recorded words and phrases (usually from a single voice actor, to keep continuity), then use complex software to stitch them back together for recognizable playback. Now the biggest hurdle, at least in the playback portion of the process, is making that voice sound natural and human.
The solution is a combination of the old chopped-up phonemes and more targeted phrases used for common voice command functions. The video demonstrates the surprisingly intense and nuanced process of planning and recording the audio for the new TTS voice. Neat.