Curious what the Pixel 4's Soli radar system "sees" when you wave your hand over it to skip tracks or reach for your phone as it automatically turns on? Turns out, the image it forms is surprisingly undetailed, with "no distinguishable images of a person’s body or face" generated. It's all about detecting motion with finely-tuned machine learning models, and the abstract picture it paints is pretty blurry.
Google detailed some of the technical inner workings of Soli's radar-based technology in a blog post earlier today. Engineers comfortable talking about superposition and frequency modulation will find it a good read, but the short version is that Soli doesn't really "see" in the traditional sense at all. It's not just using radar to make some kind of photo, even if Google was able to make the examples you see just below. However, they do help us visualize how it works a bit better.
The "image" that the Pixel 4's Soli radar "sees."
In all the images above, distance is demonstrated from closer to further as you go from top to bottom, with the strength of the signal received mapped to brightness. The left animation demonstrates a body walking toward the phone, the middle shows an arm reaching for it, and the right shows a person swiping over it — like the Motion Sense gesture for moving between tracks, we assume. As you can see, its resolution isn't fine enough to make out many details at all, though it can tell two things with some accuracy: Object/body presence and motion. And, it turns out, that's enough.
Now, your Pixel 4 doesn't see precisely "see" these kinds of images when trying to determine if you're switching between tracks, but it does process a signal of the type that was used to generate these images, feeding it to a machine learning model to filter it out and ultimately determine when to act. That's because it's one thing for your phone to wake when you reach for it or swipe to skip tracks with intent, but that wouldn't be of as much value if it couldn't also make a distinction between those actions and the myriad ways it might be accidentally triggered, and that means Google has to use some of that machine learning it uses for everything these days.
Google trained a model with millions of gestures that were recorded from thousands of volunteers, allowing it to better learn how different people performed the sorts of gestures it needed to detect. To fight false positives and accidental triggers, Google further trained it to make a distinction based on things observed from hundreds of hours of "background radar" recordings of generic, unintended gestures performed near the phone. The result is a model that's become surprisingly good. In fact, Soli can even filter out interference generated by the phone itself, like movement/vibration produced by the speakers if playing music.
When you're detecting motion, vibrations from audio "look" like this.
Over half a decade, Google was able to shrink the hardware that powers Soli from the dimensions of a desktop computer to the comparatively small forehead in the Pixel 4, reducing size and power consumption until it could fit and work in a phone, all so it can wake up when you reach for it or skip tracks when you fling your hand.
Even if you aren't a fan of Google's hardware, it's pretty cool to geek out on how this stuff works.