Performance Analysis: Stemming Algorithm for the English Language |
Author(s): |
| Vairaprakash Gurusamy , Madurai Kamaraj University; Dr. S. Kannan, Madurai Kamaraj University; K. Nandhini, Concentrix India Pvt. Ltd |
Keywords: |
| Stemming, Suffix Stripping, Information Retrieval, Text Preprocessing, Morphology |
Abstract |
|
Information retrieval is a process of retrieving the documents to satisfy the user's need for information. The user's information need is represented by a query, the retrieval decision is made by comparing the terms of the query with the terms in the document itself or by estimating the degree of relevance that the document has to the query. Words in a document may have many morphological variants. These morphological variants of words have similar semantic interpretations and can be considered as equivalent for the purpose of IR applications. For this reason, a number of so-called Stemming Algorithms, which reduces the word to its stem or root form have been developed. Thus, the key terms of a query or document are represented by stems rather than by the original words. Stemming reduces the size of the index files and also improves the retrieval effectiveness. A stemming algorithm is a computational procedure which reduces all words with the same root (or, if prefixes are left untouched, the same stem) to a common form, usually by stripping each word of its derivational and inflectional suffixes. This study evaluated the performance analysis of the basic three suffix removal stemming algorithms in the English language called Lovins, Porter and Paice/Husk by Accuracy and strength of the algorithm. |
Other Details |
|
Paper ID: IJSRDV5I50991 Published in: Volume : 5, Issue : 5 Publication Date: 01/08/2017 Page(s): 1933-1938 |
Article Preview |
|
|
|
|
