Google's new Cloud Text-to-Speech platform incorporates DeepMind's WaveNet technology for more natural sounding voices

Android Police

By Scott Scrivens

Published Mar 28, 2018

Back in December last year, we looked in-depth at the work Google has been doing to improve text-to-speech and other artificial language use cases. Artificial voice synthesis can be much more powerful and impressive thanks to WaveNet neural network technology, developed by Alphabet subsidiary DeepMind. It's been used to make the Google Assistant sound more natural, and now makes up part of a whole new product: Cloud Text-to-Speech.

According to Google's blog post, the new service can be used to bring advanced artificial voices to a variety of areas, such as voice response systems for call centers, conversations with IoT devices, and converting text-based media to audio. There are 32 basic voices to choose from, across languages like English, Spanish, French, German, Japanese, and more. Some languages even have a range of male and female voices available.

Only American English makes use of WaveNet tech, with 6 enhanced voice options (3 male, 3 female). The updated version of WaveNet used is said to generate audio 1,000 times faster than before. Its fidelity has also been increased to 24,000 samples per second and the resolution has been bumped up from 8 to 16 bits, all of which should add up to a more human sound.

Complex text-to-speech tasks like pronouncing names, addresses, and times are handled easily by Google's platform, and you can also change the pitch, speed, and volume gain of the output voice. Both MP3 and WAV formats are supported.

Cloud Text-to-Speech is already being used by companies like Cisco and Dolphin ONE, and other interested businesses can check out the documentation and pricing for more information. For the rest of us, well it's just fun to play with the sampler. I'm really enjoying copying various song lyrics into it right now.

Source: Google