How Google taught the Assistant to understand context

The way that we can talk to a smartphone or smart speaker and expect an answer would sound like science fiction just a few decades ago, but the Star Trek-like fantasy has been fully realized in the pockets, desks, and counters of millions (if not billions) of customers. Even more fantastically, you can carry out near conversations with follow-up questions thanks to tools like Continued Conversation on the Google Assistant, which take advantage of the context of a follow-up question when answering a new one. And, like many of Google's machine-learning-based software features, how it works is pretty clever.

We’re all so used to the power of context that the concept of it often doesn’t enter our minds unless it’s overtly missing. Whether you’re talking to a friend about some new gadget or reading a book, information can be easily omitted without obscuring meaning. For example, if you’re talking about the specs for some upcoming phone with a friend, and they ask, “how much does it cost?” you don’t need to reiterate the phone as a subject. It’s implied. The same goes for fast-paced description in a novel or a discussion of cause and effect in an economic textbook — you don’t have to repeat everything that came before it to understand what's being said. Even “no” can be a complete sentence in the face of context.

This is one of those things that is easy for a human to understand but really hard for a machine to get. Context is implied to our minds, but it’s not intrinsic in the way that a computer processes language, especially when interrupted with answers. But systems like Google Assistant's Continued Conversation work best when the context of a new question can be understood as related to certain subjects or topics from the previous one, saving you from more explicitly and completely repeating yourself.

Google published a detailed explanation to its AI Blog last week regarding precisely how the Assistant learned to understand context, and the solution it arrived at is surprisingly simple: Rephrase the follow-up question to include it. This has one big advantage in that Google can simply bolt another system on top of what the Assistant is already doing, rather than rework how it processes questions from the ground up to accommodate missing information.

The “how” behind all this is fascinating because there are a lot of different ways you can programmatically add context to a new query. In fact, Google uses multiple different methods to do this, grouped into three general categories:

Shoehorning in bits and phrases from prior queries and answers into follow-up questions, taking into account grammar and linguistic structure, so they aren’t gibberish.
Matching key terms that it recognizes in prior queries from popular or common questions to see if any apply to this situation.
Machine learning-based generators trained on samples that can dynamically generate candidates.

Context takes a lot of forms.

The mechanisms differ, but the effect is the same, and the Assistant does all these, generating a handful of different candidates that might be the sort of “full sentence” version of your follow-up question, which the Assistant is much more likely to understand. With these tricks, questions like “When?” can become “When was the Eiffel Tower built?” when asked after “Who built the Eiffel Tower.”

These various “candidates” are then scored because they may not all be accurate. Google grades them each based on topical similarity to the previous query and whether it appears incomplete or linguistically broken/ungrammatical. Google also tries to eliminate false positives for new queries that might look like follow-up questions to some of these models but are actually fully standalone.

In the end, the best candidate is chosen (if one is deemed applicable), and the Assistant instead parses the full-sentence, rephrased version of a question in place of the contextually abridged query. Google says this system works for “most” queries, and while I do remember having to rephrase a question or two myself to better make context in a follow-up question clear, it’s surprisingly functional. Best of all, if you’re using Continued Conversation, it makes for a much more natural process. Rather than trying to optimize each question like you would a search engine query, you can think and speak more naturally when seeking more information, and the Assistant itself adds the necessary context to understand that this question is related to what’s already been said.

If Google’s vision of ambient computing is the future, it’s tools like this that make computers even easier to interact with that it will build on.