Welcome to the future. No, really, it's the future, right here and right now. And not just because we've got mobile processors that can calculate Pi to the ten trillionth digit, or because our video games are starting to look more like movies than games. Nope, what makes me feel like I'm living in The Future(TM) more than anything else is how all that pie-in-the-sky Moore's Law tech gets applied to solving very human problems, like figuring out where the exit is in the Jakarta airport.
Case in point: Google's Translate app is applying the Word Lens visual translate tool, which lets you point your phone's camera to a sign or piece of paper and see the text in your native language, to 20 new languages. And also La Bamba. When you're done trying to get Ritchie Valens out of your head, come back here to marvel at how that stuff actually works.
Google let us see the man behind the curtain in this blog post. It's a detailed explanation of the computer science behind near-instantaneous translation. Basically Google has to use optical character recognition on rotating video frames with an astonishing degree of accuracy, then use a massive cross-language dictionary to translate the words on the screen, then place them back into the video in a position, size, and style that matches the original text... all while using as little transmitted data as possible.
It's harder than it seems, because the live translation aspect of the tool needs to be able to recognize small, blurry text almost instantly - Google's computers are basically doing instant Captcha puzzles several times a second. To "train" the system, engineers actually had to create algorithms to add digital dirt and smudges to letters in order to get it to work with as wide a variety of input as possible. That's even more difficult than it sounds, because Google also had to make sure not to strain the system with false positives or attempt to translate things the user might not intend, like vertical relative text. It also offloads at least some of the work to the phone's local hardware in order to optimize the connection.
This is the kind of Star Trek magic that makes you wonder what's coming next. Check out Google's Research Blog post for a more complete breakdown.
- Google Research Blog