Have you noticed an improvement in your phone's voice search or voice dictation capability in the last few days, especially in noisy environments? You can thank the Google Speech Team. They've implemented a new system for automated listening to human voices. Adding recurring neural network functionality to the system has allowed it to more accurately identify complete words instead of individual snippets of sound. From the Google Research Blog:
Our improved acoustic models rely on Recurrent Neural Networks (RNN). RNNs have feedback loops in their topology, allowing them to model temporal dependencies: when the user speaks /u/ in the previous example, their articulatory apparatus is coming from a /j/ sound and from an /m/ sound before. Try saying it out loud - “museum” - it flows very naturally in one breath, and RNNs can capture that. The type of RNN used here is a Long Short-Term Memory (LSTM) RNN which, through memory cells and a sophisticated gating mechanism, memorizes information better than other RNNs. Adopting such models already improved the quality of our recognizer significantly.
The next step was to train the models to recognize phonemes in an utterance without requiring them to make a prediction for each time instant. With Connectionist Temporal Classification, the models are trained to output a sequence of “spikes” that reveals the sequence of sounds in the waveform. They can do this in any way as long as the sequence is correct.
If your head is spinning like Colonel O'Neill after an explanation of temporal wormhole physics, you're not alone... and there's a lot more where that came from. The take-away is that Google's voice search and related functions on Android and iOS are now better at recognizing the more nuanced patterns in speech, and returning those correct results more quickly. If you've ever struggled to get voice commands across in a noisy car on the highway, you should be able to appreciate the work put into it.
If you understand acoustic modeling and computer science better than I do, be sure to check out the full walkthrough at the source link. These changes are live on Android for the Google search app and voice dictation, but not yet on Chrome OS or Chrome desktop browsers.
- Google Research Blog