[Hands-On] Google Docs Photo Upload OCR: It Kind Of Sucks At This Point

Quick Links

Conclusion

We were all very excited to hear about the Google Docs for Android announcement this morning, and even more so when we learned it came with a special surprise feature: the ability to upload photos of physical documents from your Android phone and have them transcribed by Google Docs into editable text.

So, the first thing I was curious about, naturally, is just how well this new feature works in the real world. As you may have guessed from the title, not very. Let me show you the photos I tasked Google Docs for Android with transcribing.

Document 1: Printed handout:

Document 1 results:

OF CONSTITUTIONAL ANALYSIS ON DEFAMATÍON
PUBLIC OFFICIALS GENERAL PURPOSE PUBLIC FIGURES
LIMITED PUBLIC FIGURES WHERE MATTER OF PUBLIC CONCERN
LIMITED PUBLIC FIGURE WHERE NOT MATTER OF PUBLIC CONCERN
PRIVATE PERSONS WHERE MATTER OF PUBLIC CONCERN
PRIVATE PERSONS WHERE NOT MATTER OF PUBLIC CONCERN
ACTUAL MALICE. FOR LIAB.
YES
YES
Up to states as to negl. or actual malice
Up to states as to negl. or actual malice (or strict liabf?)
STRICT LIAB. AVAIL. TO STATES
NO
NO
NO
NO
NO
YES
PUNITIVE DAMAGES AVAIL. ACTUAL
MALICE
NO
NO
YES NO

Admittedly, a table on a crinkled page isn't the easiest thing in the world to start with. Still, Docs missed several words and punctuation points, and added an accent to one character for no apparent reason. This is what I'd call "mediocre," not by any means a failure, but not fantastic. Unfortunately, this was the best result I was able to achieve out of any document I snapped.

Document 2: Recipe from cookbook:

Document 2 results:

gnocchi alla Romana
in advance and even hold them un the oven for an extra ten minutes if youre
having difficulty with guests hanging around the cocktails too long, .icups MILK iteaspoon SALT (1 stick) unsalted BUTTER
lcupSEMOLINA flour (see sidebar, page 81) lcup freshly grated PARMIGIANO-REGGIANO cheese

Not even close. The major issue here is that the OCR failed to insert proper line breaks. And it missed the entire first sentence. It also couldn't distinguish between "I" and "1" (difficult for a computer, for sure), and missed several words. I tried different recipes from the same cookbook and achieved similar results - the errors made the text generally worthless as a reference, and recipe-snaps are bound to be one of the most common uses for the OCR feature (well, for those of us that cook, at least!)

I tried different lighting, angles, and levels of zoom, and straightening out the page, but to no avail - I achieved consistently crappy (or totally blank) results once the upload had been transcribed.

Document 3: Textbook:

Document 3 results:

„_ »M only plân. Xample, in Malley V. Hanna, 101 A.D.2d 1019, 476 N.Y.S.2d 700, F0” 9 984) affirmed 65 N.Y.2d 289, 491 N.Y.s.2d zas, 480 Nm@ was 701 (1 h J ovenant declared, “No double house shall ever be built.” (1985), t e lt this was sufficient to indicate the intention to benefit the The land which was three lots away from the restricted grantor s 1" f grantor’s retained land, which was three lots away from the restricted land. Likewise, in Friedlander v. Hiram Rieker & Sons, 485 A.2d 965 (Me.l984), one tract was carved out of a larger tract and restricted te singlefamily use, although the retained land was not so restricted. The intention to benefit the retained land was held to have been shown by
the afñdavit of an ofñcer of the corporate promisee to that effect.

This one was probably the closest, but at the same time, it missed so much in the beginning of the text that again the transcription is nearly useless - there's little point to the feature if it can't upload a single paragraph in its entirety. The text here is clear and crisp, and aside from this textbook's odd placement of the dot on the letter "i" in certain contexts, it should be pretty easy to read. The curvature of the page seems to present Docs with the biggest difficulty, because it results in a two-dimensional skewing of characters.

Small text also is a big no-no, I attempted to capture an entire page from this textbook several times and each attempt ended in a blank transcription.

Conclusion

Before you start saying my phone's camera is crap, or the lighting is bad, or the bends on the pages are causing the OCR to become confused, remember one thing: these are real-world conditions. I am not going to sit down, double-check my light levels, hold out a page in a book so it appears straight, and take 20 photos until I get one that looks good. That's not the purpose of this feature. I could type out the text I'm trying to snap in the amount of time it would take me to get it exactly right, and at that point, why do I even need OCR?

I've conducted a test that mimics the way a person would use the Google Docs photo upload capability in reality, I even adjusted my lighting and took multiple photos of each document type for this test just in case - and these were the best results I achieved.

I'm not harping on Google here - this OCR technology is probably the best in the business, but it still has a long way to go before a camera phone will be an adequate replacement for a flatbed or feed scanner. I'm sure we'll see improvements in Docs OCR as time goes on, but for now, it's little more than a party trick for making your iPhone-toting friends jealous.