Google's Pixel 3 and 3 XL have some of (if not the) best cameras of the year, and paired with that generally impressive performance are a handful of pretty nifty software features. One of them, called Top Shot, makes those great photos even better by helping out when you make a mistake and snap a photo just a bit too early or too late. And while we knew a little bit about how Top Shot worked, there were pretty large gaps in our knowledge. Thankfully for the inquisitive, Google has just published a more technical explanation of the tech behind it.
The full deets are up at Google's AI blog for your long-format technical reading pleasure — though the official walkthrough goes a bit out of order, doubling back on itself in a way that can make it a more difficult read than other excellent AI blog posts. If you prefer the simpler version, read on:
Google Clips heritage
Top Shot's functionality builds on the same tools created by Google for use with the Google Clips. Although it doesn't seem to have had much success as a product itself, the constraints inherent to Clips required some advanced problem solving: How could you make an automatic camera that independently recognizes and saves only the best short video opportunities it sees?
For the skinny on how Google pulled that magic off, you can read the detailed explanation published earlier this year, but the (exceedingly) short version was that they used machine learning to do it — the vogue mechanism of solving all abstract problems these days.
Photographers and video editors provided relative metrics between pairs for Google Clips' training data.
A model was trained on thousands of pre-selected source videos with the help of professional photographers and video editors, who manually selected between pairs to train the model on the best clips to seek to emulate. In fact, over fifty million binary comparisons were made in collecting data for the model. Combining it with the preexisting image recognition prowess already developed for Google Photos, the developers behind Google Clips were able to make a model that worked to predict "interesting" content rated on what it called a Moment Score built on recognizing the stuff in frame together with the qualities of a good clip. But, the model could only run on power-hungry hardware. The really ingenious part was then training a simpler model in parallel to emulate the performance of its server-based brother (i.e., using one model to train another).
There's a whole lot more to it, but with all of that information combined (plus some ongoing on-device training that recognizes more "familiar" faces and pets over time) Google Clips is able to determine a relative score what it's seeing, and further choose when and how to capture content.
With the tiny Google Clips able to recognize good quality content from not-so-good, it was a relative mental hop, skip, and a jump to consider adapting the general concept to the Pixel 3 — albeit in a slightly different way.
Brought to the Pixel 3
Even before you press the metaphorical shutter on the Pixel 3, Top Shot is already working in the background in tandem with Motion Photos — if you remember, that's the feature on Google's Pixels that allows you to record a short video of just before, during, and after a photo is taken. It might seem like a simple step moving from capturing before and after video to before and after photos, but there's a whole lot more to it.
Google claims that Top Shot captures up to 90 images from the 1.5 seconds before and after the shutter is pressed for comparison. When we spoke to a representative from Google at the Pixel 3 launch event about Top Shot, we were told most of the alternate photos were stills pulled from the Motion Photo, but a select few Top Shot alternates are also saved before the video encoding process, in a higher quality resolution than the subsequent Motion Photo video itself.
Abstract diagram of the Top Shot capture process.
But long before it can save those images, Top Shot needs to very quickly determine which are worth saving, à la the previously mentioned Google Clips. Top Shot's adapted, power-efficient, on-device model was trained on sample photos in sets of up to 90 to sort through all the images captured to save just the best. It eliminates those that might be blurry, improperly exposed, or in which a subject's eyes could be closed, and tries to recognize things like smiling or other visible emotional expressions. It even takes into account other data like information from the phone's gyroscope (captured for other uses) to further augment the speed at which it's able to determine an alternate photo's quality.
When it has found up to two photos it thinks could be better than the one you intended to snag, they're saved in HDR+ quality and set aside in the same file with the Motion Photo. And later, when you go to review those photos, the option to switch to one of the other intelligently captured alternates will present itself. Should you so desire, you can even manually select one of the lower-resolution, non-HDR photos if you think they're even better.
Two quick and easy Top Shot recommendations. Who knew such simplicity was built on such complexity?
Like every hot new feature on phones these days, Top Shot's photo magic comes courtesy of a whole lot of advanced machine learning-powered software engineering. For how easy and convenient it is to use, there are a whole lot of advanced and complex machinations happening behind the scenes, and now they're a little less mysterious for you.