27
Apr
6
Last Updated: July 24th, 2011

We were all very excited to hear about the Google Docs for Android announcement this morning, and even more so when we learned it came with a special surprise feature: the ability to upload photos of physical documents from your Android phone and have them transcribed by Google Docs into editable text.

So, the first thing I was curious about, naturally, is just how well this new feature works in the real world. As you may have guessed from the title, not very. Let me show you the photos I tasked Google Docs for Android with transcribing.

  • Document 1: Printed handout:

2

Document 1 results:

OF CONSTITUTIONAL ANALYSIS ON DEFAMATÍON
PUBLIC OFFICIALS GENERAL PURPOSE PUBLIC FIGURES
LIMITED PUBLIC FIGURES WHERE MATTER OF PUBLIC CONCERN
LIMITED PUBLIC FIGURE WHERE NOT MATTER OF PUBLIC CONCERN
PRIVATE PERSONS WHERE MATTER OF PUBLIC CONCERN
PRIVATE PERSONS WHERE NOT MATTER OF PUBLIC CONCERN
ACTUAL MALICE. FOR LIAB.
YES
YES
Up to states as to negl. or actual malice
Up to states as to negl. or actual malice (or strict liabf?)
STRICT LIAB. AVAIL. TO STATES
NO
NO
NO
NO
NO
YES
PUNITIVE DAMAGES AVAIL. ACTUAL
MALICE
NO
NO
YES NO

Admittedly, a table on a crinkled page isn't the easiest thing in the world to start with. Still, Docs missed several words and punctuation points, and added an accent to one character for no apparent reason. This is what I'd call "mediocre," not by any means a failure, but not fantastic. Unfortunately, this was the best result I was able to achieve out of any document I snapped.

  • Document 2: Recipe from cookbook:

6

Document 2 results:

gnocchi alla Romana
in advance and even hold them un the oven for an extra ten minutes if youre
having difficulty with guests hanging around the cocktails too long, .icups MILK iteaspoon SALT (1 stick) unsalted BUTTER
lcupSEMOLINA flour (see sidebar, page 81) lcup freshly grated PARMIGIANO-REGGIANO cheese

Not even close. The major issue here is that the OCR failed to insert proper line breaks. And it missed the entire first sentence. It also couldn't distinguish between "I" and "1" (difficult for a computer, for sure), and missed several words. I tried different recipes from the same cookbook and achieved similar results - the errors made the text generally worthless as a reference, and recipe-snaps are bound to be one of the most common uses for the OCR feature (well, for those of us that cook, at least!)

I tried different lighting, angles, and levels of zoom, and straightening out the page, but to no avail - I achieved consistently crappy (or totally blank) results once the upload had been transcribed.

  • Document 3: Textbook:

7

Document 3 results:

„_ »M only plân. Xample, in Malley V. Hanna, 101 A.D.2d 1019, 476 N.Y.S.2d 700, F0” 9 984) affirmed 65 N.Y.2d 289, 491 N.Y.s.2d zas, 480 Nm@ was 701 (1 h J ovenant declared, “No double house shall ever be built.” (1985), t e lt this was sufficient to indicate the intention to benefit the The land which was three lots away from the restricted grantor s 1" f grantor’s retained land, which was three lots away from the restricted land. Likewise, in Friedlander v. Hiram Rieker & Sons, 485 A.2d 965 (Me.l984), one tract was carved out of a larger tract and restricted te single­family use, although the retained land was not so restricted. The intention to benefit the retained land was held to have been shown by
the afñdavit of an ofñcer of the corporate promisee to that effect.

This one was probably the closest, but at the same time, it missed so much in the beginning of the text that again the transcription is nearly useless - there's little point to the feature if it can't upload a single paragraph in its entirety. The text here is clear and crisp, and aside from this textbook's odd placement of the dot on the letter "i" in certain contexts, it should be pretty easy to read. The curvature of the page seems to present Docs with the biggest difficulty, because it results in a two-dimensional skewing of characters.

Small text also is a big no-no, I attempted to capture an entire page from this textbook several times and each attempt ended in a blank transcription.

Conclusion

Before you start saying my phone's camera is crap, or the lighting is bad, or the bends on the pages are causing the OCR to become confused, remember one thing: these are real-world conditions. I am not going to sit down, double-check my light levels, hold out a page in a book so it appears straight, and take 20 photos until I get one that looks good. That's not the purpose of this feature. I could type out the text I'm trying to snap in the amount of time it would take me to get it exactly right, and at that point, why do I even need OCR?

I've conducted a test that mimics the way a person would use the Google Docs photo upload capability in reality, I even adjusted my lighting and took multiple photos of each document type for this test just in case - and these were the best results I achieved.

I'm not harping on Google here - this OCR technology is probably the best in the business, but it still has a long way to go before a camera phone will be an adequate replacement for a flatbed or feed scanner. I'm sure we'll see improvements in Docs OCR as time goes on, but for now, it's little more than a party trick for making your iPhone-toting friends jealous.

David Ruddock
David's phone is whatever is currently sitting on his desk. He is an avid writer, and enjoys playing devil's advocate in editorials, and reviewing the latest phones and gadgets. He also doesn't usually write such boring sentences.

  • http://danielkvist.net Daniel Kvist

    I agree that it would be fantastic if the app could interpret such low-quality shots but I don't think it can be expected. An example with a straight up text document with a nice angle in good lighting would be nice to see as well.

    • David Ruddock

      I imagine if you're using it in a florescent-lit office on a perfectly straight piece of white paper with size 12 font and an 8MP camera held perfectly still that you'd get near-perfect results, for sure.

      But the fact is, this feature is for quick-and-dirty grabs by its very nature (being on a phone, that is), and I imagine 95% of the time people will want to use it, it's going to be in much less than ideal conditions.

      • BillW

        I tried it at the airport on a book my wife was reading and was pretty amazed with the results. Droid X.

  • bk w/ bloody sauce

    Well your phone’s camera is crap, or the lighting is bad, or the bends on the pages are causing the OCR to become confused.. oh wait, you said DON'T...right right lol

  • Vert

    Most PC-based OCR that I've tested couldn't handle those images either, so I disagree with your conclusion.

    • David Ruddock

      The point is, no OCR can handle cameraphone images (at least at the resolution G-docs uploads them at), and as such, there isn't much point to the feature right now.

      I never said it was any worse than any available solutions out there, just that it still isn't very useful at this point.

  • Skillit

    I think you are expecting too much from a OCR using a phone camera, even using a professional software and table scanner the results are not much better from what you have there.

    Off all document scanning solution presented on a smartphone platform this my very well be the best by a long shot

  • Dennis

    I tried it with decent results so far. Not as good as PDF scanner (with the add in) but google docs is free and very convenient from the widget.

  • http://www.wisetrend.com Ilya E

    It takes one second to hold the page of the book flat, and to consequently achieve near perfect OCR result. I cannot believe the author is suggesting to type in the entire page by hand in one second.
    "I could type out the text I’m trying to snap in the amount of time"
    Proper input produces proper output. I cannot claim to fill my car's gas tank with diesel and expect good performance. Even further, I cannot expect to fuel my Porsche with low quality gas (hey, gas is gas) and expect high performance.
    Yes there are some limitations still, but OCR as it is now can easily handle cell phone camera's distortions, shadows, uneven shades, and poor lighting. But it has to be reasonable. For example, iPhone app FotoNote (http://itunes.apple.com/us/app/fotonote/id403100177?mt=8) with WiseTREND's OCR gets high reviews from users, and the sole purpose of the app is to process and preserve mobile photos. Android can use the same underlying OCR API to achieve exactly the same good OCR result.

  • Zijyfe Duufop

    Yeah, any phone camera would not give the best quality in the first place. I'm pretty certain that this sort of feature was meant for professional (or even retail) quality scanners, and not even for cameras in general, let alone a situation in which the best quality you can find is a few megapixels. I suppose that if anyone had a phone that could shoot professional quality movies, then this would be more reasonably accurate, but until then (or until someone figures out a way to attach a higher quality camera to a phone properly), this is all we've got. And hey, better that it has a bad one than nothing at all; at least people can test it and leave feedback.

  • spurious@server.com

    No-one seems to want to admit it, but OCR is still in infancy. Sure, it's been a long, long infancy, but the results are still so sub-amazing, they're hard to believe. Writing the definitive OCR API (even for one target language!) is certainly a mega geek-challenge in the waiting...

  • giantslor

    Google's OCR isn't "probably the best in the business," it's one of the worst. There are several commercial solutions that produce excellent results. Google just uses a very old open-source engine.

    • Think

      The best way to do relative evaluation of GoogleDocs OCR is try one upload from mediocre conditions to Evernote. Then search for most words in Evernote upload. See the difference? Googledocs OCR is crap vs industry standards today.