High Impact Factor : 4.396 icon | Submit Manuscript Online icon |

Automatically Extracting References from PDF Documents

Author(s):

Ganesh D Gourshete , Information Technology, PIIT New Panvel , India; Prof. Sharvari S Govilkar, Computer Science and Engineering, PIIT New Panvel , India

Keywords:

HMM, DVHMM, CRF

Abstract

Every day the number of citations an author receives is becoming more important than the size of his list of publications. The automatic extraction of bibliographic references in scientific articles is still a difficult problem in Document Engineering, even if the document is originally in digital form. The number of citations a given article receives may be an indication of its importance in a given area. Thus, the task of collecting citation index information is becoming more important than the size of the list of publications of a given author or researcher. Digital documents can be easily converted to text by using any PDF to Text converter. By using different Information Extraction techniques references can be extracted .The statistical, probabilistic and machine learning along with Knowledge Engineering can increase the analysis accuracy. OCRs engines are used to convert image pdf to text. Automatic metadata generation has sometimes been put forward as a solution to the ‘metadata bottleneck’ that repositories and portals are facing as they struggle to provide resource discovery metadata for a rapidly growing number of new digital resources. Automated metadata extraction saves time and efforts for both resource uploaders and repository managers. Tool support could fully or semi-automated, in other words would allow user to check and correct suggested values to more precise. In both cases tool support would prevent expensive manual creation and allow to expand number of collected metadata records. Different techniques such as as regular expressions, HMM, DVHMM, CRF,SVM are also used to improve the attribute extraction results.

Other Details

Paper ID: IJSRDV3I30019
Published in: Volume : 3, Issue : 3
Publication Date: 01/06/2015
Page(s): 304-308

Article Preview

Download Article