High Impact Factor : 4.396 icon | Submit Manuscript Online icon |

Performance Analysis: Stemming Algorithm for the English Language

Author(s):

Vairaprakash Gurusamy , Madurai Kamaraj University; Dr. S. Kannan, Madurai Kamaraj University; K. Nandhini, Concentrix India Pvt. Ltd

Keywords:

Stemming, Suffix Stripping, Information Retrieval, Text Preprocessing, Morphology

Abstract

Information retrieval is a process of retrieving the documents to satisfy the user's need for information. The user's information need is represented by a query, the retrieval decision is made by comparing the terms of the query with the terms in the document itself or by estimating the degree of relevance that the document has to the query. Words in a document may have many morphological variants. These morphological variants of words have similar semantic interpretations and can be considered as equivalent for the purpose of IR applications. For this reason, a number of so-called Stemming Algorithms, which reduces the word to its stem or root form have been developed. Thus, the key terms of a query or document are represented by stems rather than by the original words. Stemming reduces the size of the index files and also improves the retrieval effectiveness. A stemming algorithm is a computational procedure which reduces all words with the same root (or, if prefixes are left untouched, the same stem) to a common form, usually by stripping each word of its derivational and inflectional suffixes. This study evaluated the performance analysis of the basic three suffix removal stemming algorithms in the English language called Lovins, Porter and Paice/Husk by Accuracy and strength of the algorithm.

Other Details

Paper ID: IJSRDV5I50991
Published in: Volume : 5, Issue : 5
Publication Date: 01/08/2017
Page(s): 1933-1938

Article Preview

Download Article