Stop Word Removal of English Text Documents Based on Finite Automata |
Author(s): |
Shraddha Kishor Bhirud , SSBT COET,Bambhori,Jalgaon; Shraddha K. Bhirud, SSBT COET,Bambhori,Jalgaon; Komal D. Bhagvat, SSBT COET,Bambhori,Jalgaon; Atul P. Marathe, SSBT COET,Bambhori,Jalgaon; Jaypal A. Rajput, SSBT COET,Bambhori,Jalgaon |
Keywords: |
Information Retrieval (IR), Natural Language Processing (NLP), English, Stopword, Tokenization |
Abstract |
In IR(information retrieval systems), Web Mining, Artiï¬cial Intelligence, Natural Language Processing, Text Summarization, Text and Data Analytic systems, optimization of text data becomes very important. One of the preprocessing step is stop word removal. Some extremely common words which would appear to be of little value in helping select documents matching a user need are excluded. These words are called stop words. In the Information era, optimization of processes for Information Retrieval, Text Summarization, Text and Data Analytic systems becomes utmost important. Therefore in order to achieve accuracy, extraction of redundant words with low or no semantic meaning must be ï¬ltered out. Such words are known as stopwords. Stopwords list has been developed for languages like Sanskrit, Chinese, Arabic, Hindi, etc. Stopword list is also available for English language. A large number of available works on stop word removal techniques are based on manual stop word lists. An efficient stop word removal technique is required. In this paper, we are proposing a stop word removal algorithm for English Languages. Which is using the concept of a Finite Automata (DFA). Then pattern matching technique is applied and the matched patterns, which is a stop word, is removed from the document. |
Other Details |
Paper ID: IJSRDV7I40004 Published in: Volume : 7, Issue : 4 Publication Date: 01/07/2019 Page(s): 39-42 |
Article Preview |
|
|