High Impact Factor : 4.396 icon | Submit Manuscript Online icon |

HTPI: Hadoop Text Processing Interface


Disha Kangar , Kurukshetra University, Kurukshetra; Dr. Kanwal Garg, Kurukshetra University, Kurukshetra


Document Similarity, Hadoop, Information Retreival, Jaccard Coefficient, Map-Reduce, Skipping, Stemming.


Text mining is a practice which is regarded as the supporting pillars of Information Retreival. This paper is in simple terms dedicated to text mining and bear a prime focus on mining academic papers. An architecture is proposed by the authors is presented in the paper, which they have named HTPI. This framework is built upon Java eclipse using Apache Hadoop. The problem under consideration for the paper is the reference metamorphosis of the references mentioned in the references section of any scientific paper based upon the similarity score(between the referenced paper and the paper whose reference list is being re-ordered) retrieved. Various notions have been used in the paper like stemming, skipping and similarity calculation using Jaccard Coefficient.

Other Details

Paper ID: IJSRDV2I4032
Published in: Volume : 2, Issue : 4
Publication Date: 01/07/2014
Page(s): 53-55

Article Preview

Download Article