The robots are winning: Google's neural network OCR gives names to the nameless (streets)

Android Police

By Ryne Hager

Published May 4, 2017

Everyone uses Google Maps, but not often do we consider where, exactly, all that data comes from. When a new road goes in, or a bypass, or the name of a street changes, it isn't as if your local city reaches out to Google to make sure everything is up to date. Some of that used to come from user submissions in the Map Maker forum. Now those tools are being rolled into Maps itself, but that's not the only source of information. Google's fleets of Street View cars collect an insane number of images, and nestled in with them are pictures of businesses, street signs, and addresses, and Google's latest research blog post goes into some interesting details about all that potential data.

Those signs might be recognizable as text to you and me, but they obviously aren't text to a computer. And, it wouldn't be cost-effective to have an army of people pouring over all those images just to enter them in plain text. That's where machine learning and neural networks start to come in handy (warning, wiki-hole). Really, though, a neural network is just a way of processing data via training, rather than by direct instruction. Instead of explicitly programming a system to do something, you put a system together that can be told when its results for a given input are right or wrong, and which then adapts weighted triggers in a hidden processing layer until the output is correct. Basically, you show it things and tell it what it should see, until it does.

As an aside, you can actually make your own neural network at home pretty easily and do all sorts of fun stuff. In my college computational systems course I only ever used one to make an interpreter for portrait images to determine the sex of a person pictured, and even then I think I never got things above 65% accuracy. My point is, they're easier to understand than their complex name might imply.

Google is making use of these types of deep neural networks to read street names from images collected via Street View. It's basically just a method for OCR, and it has gotten very good, over 85% accuracy reading French street names. It knows which text in an image is important to a business name, street name, or address, and what isn't and can be safely ignored. It doesn't even get thrown off (too badly) by differences in formatting or font.

Google's OCR tools have seen changes to over 30% of all addresses in Maps. In some locations, its use has resulted in over a 90% improvement. And, benevolently, Google has also been good about releasing their tools and models over the years for others to use. They did exactly that with their latest bit of OCR magic. Humorously, half of Google's mistakes from one particular model came down to so-called "ground truth" errors, which is a fancy way of saying the system was right, but the data provided was wrong.

If you're even mildly interested in any of this, I really encourage you to check out the post on Google's research blog. It's super cool stuff.

Source: Google