High Impact Factor : 4.396 icon | Submit Manuscript Online icon |

A Survey on Different Techniques of Text Categorization

Author(s):

Bhavna Rani , D.C.S.A. Computer Science & Applications, Kurukshetra university, kurukshetra

Keywords:

Support Vector Machine, KNN (K-Nearest Tokens, Lemmatization or Stemming, Stop words, Zipf's Law, Bayes Classifier, K-Neighbor Classifier, Decision Tree, Precision (p), Recall (r), F-Measure

Abstract

In the Constitution of India, a provision is made for each of the Indian states to choose their own official language for communicating at the state level for official purpose. The availability of constantly increasing amount of textual data of various Indian regional languages in electronic form has accelerated. So the Classification of text documents based on languages is essential. The objective of the work is the representation and categorization of Indian language text documents using text mining techniques. Several text mining techniques such as naive Bayes classifier, k-Nearest-Neighbor classifier and decision tree for text categorization have been used. This paper describes various techniques used for semantic text classification. Text classification (Also called Text Categorization) is one of the important research issues in the field of text mining. Due to the rapid increase in addition of text documents on the web or internet, the text classification became a serious issue to retrieve the desired text from the huge amount of data placed in unstructured form on the internet. Categorization is a process of objects and ideas are differentiated, recognized and understood. For some specific purpose, the categorization implies the objects are grouped into categories. The text classification acts as a key function to organize and deal with million of documents. This paper covers different classification techniques along with their advantages and limitations.

Other Details

Paper ID: IJSRDV4I50607
Published in: Volume : 4, Issue : 5
Publication Date: 01/08/2016
Page(s): 1042-1045

Article Preview

Download Article