Today Google has announced the release of MobileNets, a series of TensorFlow vision models built for comparatively low-power, low-speed platforms like mobile devices. In a cross-post on both the Open Source and Research blogs, Google released details about the new visual recognition software. Now even more useful machine learning tools can operate natively on your phone’s hardware, in a fast and accurate way. And, future tools like Google Lens will be able to perform more functions locally, without as much need for mobile data, and without waiting. 

It’s one thing to run a machine learning network on a system with a ton of hardware power, without having to worry about things like battery life or sharing resources with other pesky apps or services. But, to pull the same feat on a mobile device, a situation where battery life is a concern, where any operation is going to be sharing hardware with basic requirements like a UI, and where you have a goal of maintaining a smooth 60fps experience is a different thing entirely.

The new MobileNets vision model will be useful for existing tools like Google Photos, which might be able to pre-process images you take to determine their content. But, the greatest use is likely to be with things like the coming Google Lens.

For those that might not be familiar, Google Lens was a new addition to the Assistant that was revealed during this year’s I/O Keynote. Basically, it’s an image recognition model that is able to provide you with information for whatever you point it at, and it further allows you to take that information and use it in a useful way. So not only can you point it at an object and have it identified, but the content of the image can also be used. One of the examples that well demonstrated the latter was pointing your phone’s camera at the bottom of a router and having your phone automatically enter the printed SSID and password information and connect, all in just a few steps. No tapping away long strings of characters the next time you visit grandma’s house.

With MobileNets now some of that workload, in the form of the image recognition neural network operating to identify content,  can take place on your device. Landmarks, faces, objects, all these things may be capable of being identified without having to wait for a remote server to process the information for you. Local image recognition also increases a user's privacy, as less information (like the image being recognized) would have to be sent from the phone. 

Google has released a few different versions of MobileNets based on your acceptable latency and space requirements, with a range of resulting certainty. As you would expect, the higher the latency, and the more space you can assign to the model, the greater the degree of accuracy.

The new software is open source, and if you are a developer and would like to read more, Google has a paper on the subject available here. You can also learn how to run these models at Google’s TensorFlow site, which has links to models for different platforms that you can play with. And, of course, you can read up on it in greater detail at the source links below.