The Portrait mode on the new Pixel 2 phones is impressive. With just a single camera, the Pixel 2 and Pixel 2 XL are able to mimic a convincing depth of field effect, like that created by much larger hardware. It isn't quite as good as an SLR, but it's a reasonable approximation. And if you were interested in exactly how it works, we've put together an explanation.
For the full details, you can read Google's post on its Research Blog yourself, but we've assembled a simplified synopsis for exactly how it all works. And, it's pretty cool.
Now, portrait mode isn't perfect. The only way to get SLR-style large aperture results right now is with the required camera and aperture. But, with a bit of machine learning and a few clever tricks, portrait mode does produce a reasonable approximation for the depth of field effect.
What is 'depth of field'?
One of the best things about getting a bigger camera like an SLR is getting to play with depth of field. Unlike most phone cameras or point-and-shoots, a camera with a big enough sensor and a large lens (with a wide aperture) can have a very narrow focus. It's that focus that produces the 'depth of field' effect: items not within that narrow range become blurred.
Depth of field produced by a wide aperture (30mm f3.5, left) and narrow aperture (30mm f22, right)
For example, in the two images above you can see that the bugdroid is in focus, but the sharpness of the content behind it varies based on the size of the aperture. At its simplest, a larger aperture means less focus, a smaller one means more. And the large-aperture depth of field effect is great for drawing the eye to a particular part of the image, especially in portrait photography.
The lens and aperture in your phone's camera can't really pull off the same effect, though. At least, not on their own.
Like (it seems) all of Google's recent advancements, portrait mode makes use of machine learning. Specifically, they've trained a neural network to look at images and literally determine "which pixels are people and which aren't." Obviously, there's a lot more to how it actually works, but it is able to separate the person in an image from the background and foreground.
It all starts with an HDR+ portrait.
HDR+ image (left) and resulting neural network generated depth mask (right). Photo by Sam Kwestin via Google.
From that image, it's able to computationally determine those 'people pixels' from the 'non-people pixels,' creating a "mask" or filter for the photo that separates the person in it from the rest of the content.
It's a start, but that's not enough by itself. You could stop there and the mask containing the person could be applied, with everything else in the image blurred equally, but it wouldn't be a correct approximation of the depth of field effect. It needs a depth map to tell it what to keep in focus and precisely how much to blur other parts of the scene. With two cameras it could easily determine that stereoscopically, but both of the Pixel 2 phones only have one. Even so, there's a smart way to get that information.
A lot of phones use what's called phase-detection autofocus. It sounds complicated — and in some ways, it is — but it can be explained simply. Two different apertures behind the same lens capture two separate images, which are then compared. If specific features from the two are identical, an item is probably in focus. If they aren't the same, a quick bit of math based on the light hitting the sensor should be able to determine in which direction the focus needs to be changed, until the correct focus is found.
There's a lot more to it than that (you can read some more detailed explanations here and here), but it's enough for you to understand that phase-detection autofocus compares images from two slightly separated positions. Add in a bit of basic geometry, and we now have a way to calculate depth.
The the two slightly different phase-detection images, animated at right.
That's enough for the portrait mode machine learning systems to extract the information they need. It even leverages a bit of HDR+ tech, combining multiple different phase-detection images to compensate for any noise that might throw things off. Now we can create a depth map.
Depth map (left) and an abstract visualization of the resulting blur to apply (right). Brighter means more blur.
With the depth map, we can determine how much blur to apply to different parts of the image. Things further from where we focus should be more blurred, things closer to focus less so. Once we have that information, that's all we need to get started.
Combining it all together
Google's a bit mum when it comes to some of the particulars, but the "people pixel" map is combined with that depth map above in such a way that the person in the picture stays in focus while blurring other parts of the image selectively for a more realistic effect. Not only will the person in the portrait be in focus, items near that same plane of focus will also be sharper, with a realistic increasing blur as items are further in front of and behind that plane.
The original (left) and portrait mode (right)
Google even makes an effort while blurring to approximate the bokeh effect an SLR would produce, blurring based on the shape of an artificial, ideal aperture.
You can see there are a few errors in the image, like the varying blur applied to the neon lights at the top. Overexposed sections of an image are likely to have those sorts of problems. Although the neural net has been trained well, it's also possible that it might fail to accurately detect the "people pixels" required to keep the right sections in focus. If a scene doesn't have enough detail, the phase-detection system might not focus correctly or create an accurate depth map, either.
There's a lot that can go wrong, but the new portrait mode still produces better results than the "Lens Blur" mode used on previous Pixel and Nexus devices. It also works faster, in just 4 seconds, and without the up/down physical motion that Lens Blur required.
So while there are drawbacks, the new portrait mode on the Pixel 2 phones still produces a convincing approximation for a larger camera's depth of field effect. And there's the added benefit of not having to carry around a full SLR. That's good, because mine never fits into my jeans quite so well as a phone.
Header image credit: Matt Jones, via Google