Wednesday, June 18, 2014

Trying tesseract-ocr for Optical Character Recognition

Teseract is an OCR tool Developed by HP Labs.It is one of the most powerful and accurate OCR system.And it is Open Source too...so I decided to give a try
Two Options

  1. Directly installing (what's the fun in that?)
  2. Compile from the source code
So I choose second option.Downloaded latest version 3.03 source code from google drive.Compiling have two steps installing teseract engine and appropriate training data of the language.
So first into compiling. Need so many dependencies



After installing all dependencies extracted the source code into a folder.Now it is compile time..:)


make step may took some time.After compiling we need to add the language data file which is pasted into /usr/local/share/tessdata and don't forget to give proper permissions otherwise tesseract cannot access the language file.
After everything just run

















wow..the acuracy is unbelievable.!!!




No comments:

Post a Comment