High Impact Factor : 4.396 icon | Submit Manuscript Online icon |

Stop Word Removal of English Text Documents Based on Finite Automata

Author(s):

Shraddha Kishor Bhirud , SSBT COET,Bambhori,Jalgaon; Shraddha K. Bhirud, SSBT COET,Bambhori,Jalgaon; Komal D. Bhagvat, SSBT COET,Bambhori,Jalgaon; Atul P. Marathe, SSBT COET,Bambhori,Jalgaon; Jaypal A. Rajput, SSBT COET,Bambhori,Jalgaon

Keywords:

Information Retrieval (IR), Natural Language Processing (NLP), English, Stopword, Tokenization

Abstract

In IR(information retrieval systems), Web Mining, Artificial Intelligence, Natural Language Processing, Text Summarization, Text and Data Analytic systems, optimization of text data becomes very important. One of the preprocessing step is stop word removal. Some extremely common words which would appear to be of little value in helping select documents matching a user need are excluded. These words are called stop words. In the Information era, optimization of processes for Information Retrieval, Text Summarization, Text and Data Analytic systems becomes utmost important. Therefore in order to achieve accuracy, extraction of redundant words with low or no semantic meaning must be filtered out. Such words are known as stopwords. Stopwords list has been developed for languages like Sanskrit, Chinese, Arabic, Hindi, etc. Stopword list is also available for English language. A large number of available works on stop word removal techniques are based on manual stop word lists. An efficient stop word removal technique is required. In this paper, we are proposing a stop word removal algorithm for English Languages. Which is using the concept of a Finite Automata (DFA). Then pattern matching technique is applied and the matched patterns, which is a stop word, is removed from the document.

Other Details

Paper ID: IJSRDV7I40004
Published in: Volume : 7, Issue : 4
Publication Date: 01/07/2019
Page(s): 39-42

Article Preview

Download Article