top of page

Extracting the text from an image

For years Microsoft Office has installed a component called OneNote.


When it first showed up I tried it and couldn't see the point of it. So I never went near it again.


Fast forward to a recent need to get the text out of an image and I found out that OneNote actually does that!


Finally a use for this thing.


So what do I mean by "extract the text from an image"?



Well, every now and then someone will send me / you / us an image of a document rather than the actually source word processor file.


It may be a photo of a very old document that existed before word processing or something that, for some other reason, only exists in a hard copy form i.e. printed on paper.


Optical Character Recognition (OCR) has been around for a very long time and the engine behind that old technology means that words can be found within images and given to us in a manageable / editable form.


In a previous job I worked on it to translate textbooks into braille. The books would be read with OCR and printed anew using a braille printer. It was also used to read out the text using a synthesized voice. Very cool and very useful.


OCR isn't perfect and in fact the worse the image the less success you get from it.


But, it can be better than typing it all out from scratch.


OCR combined with the additional spell-checking and linguistic help built into modern word processors can even do a pretty good job on bad images....sometimes even handwriting can be "read" without too much drama...


Another free and quick way to recognize text in a document is with Google Keep, their free online note taking tool.


This works on your mobile and on your desktop.

Text recognition seems to be off most people's radar, let alone recognition from within images, but it is a feature that is worth filing away in your mind for when you need it.


In the old days you'd have to buy it, or maybe it came with your printer as some bundled extra software.


So maybe that's why people avoid it or are unaware of it.


Having quick access to free OCR can be very handy though.


And, much like speech recognition, it is actually very good these days.


Most recently I used Google Keep to translate a 20 page constitution that only existed as a PDF full of images.


OCR from those images saved me a HUGE amount of time.


Enjoy.


David







Commentaires


bottom of page