High Impact Factor : 4.396 icon | Submit Manuscript Online icon |

Optical Character Recognition

Author(s):

Aakash Hotwani , Vishwakarma Institute Of Technology,Pune; Ankush Deshmukh, Vishwakarma Institute Of Technology,Pune; Bharat Karamchandani, Vishwakarma Institute Of Technology,Pune; Vaishnavi Zade, Vishwakarma Institute Of Technology,Pune

Keywords:

RNN, Tesseract, LSTM, Binarization, Segmentation

Abstract

Optical character recognition usually denoted by “OCR” is a process of producing editable textual form from non-editable text. In simple words: It is a process of extracting text from images of either printed or handwritten text. Image can be scanned document from scanner, photo of document consisting text or may be subtitles in videos. The editable text converted by optical character recognition engine allow individual to edit, copy, search content in document. This Project consists of LSTM (Long Short Term Memory) based recognition engine, a popular form of RNN (Recurrent Neural Network) provided by Tesseract. Latest research deals with challenges regarding complexity of English handwritings and rapid conversion into editable form. Hence, a system is required which can handle all levels of English text and recognize most appropriate characters among these levels.

Other Details

Paper ID: IJSRDV7I21066
Published in: Volume : 7, Issue : 2
Publication Date: 01/05/2019
Page(s): 1338-1340

Article Preview

Download Article