High Impact Factor : 4.396 icon | Submit Manuscript Online icon |

On the Use of Side Information

Author(s):

T Saleem , Tudi Ram Reddy institute of technology and science JNTU Hyderabad; Shiva kumar, Tudi Ram Reddy institute of technology and science JNTU Hyderabad

Keywords:

Data mining, text mining, text clustering, classification, meta-data

Abstract

Text mining has been around for many years in order to extract latent information from textual documents. However, there is meta-data associated with the textual documents. Such data is nothing but the provenance information, links related to documents, user access related data. The meta-data plays a vital role in understanding the documents and their usage dynamics. Based on this information it is possible to achieve clustering of such documents. Textual and non-textual information can be used to help improve clustering process. However, knowing the importance of meta-data and how it is useful in clustering is non trivial. Therefore it is important to make use of meta-data that is important and reliable in order to use it for clustering process. There is noise that can be understood and removed in order to achieve quality in clustering. In this paper we propose and implement a mechanism that helps in making effective clustering. We built a prototype application that can be used to demonstrate the proof of concept. The empirical results revealed that the proposed mechanism works fine for clustering textual data based on associated meta-data.

Other Details

Paper ID: IJSRDV4I20969
Published in: Volume : 4, Issue : 2
Publication Date: 01/05/2016
Page(s): 748-751

Article Preview

Download Article