An appropriate similarity measure for k-means algorithm in clustering web documents

S JAIGANESH; Dr.P.JAGANTHAN

An appropriate similarity measure for k-means algorithm in clustering web documents

Author(s):

S JAIGANESH , PSNA COLLEGE OF ENGINEERING AND TECHNOLOGY, DINDIGUL, TAMILNADU; Dr.P.JAGANTHAN, PSNA COLLEGE OF ENGINEERING AND TECHNOLOGY, DINDIGUL, TAMILNADU

Keywords:

Partitional Clustering, Cosine Similarity, Euclidean, Jaccard, Pearson, KLD

Abstract

Organizing a large volume of documents into categories through clustering facilitates searching and finding the relevant information on the web easier and quicker. Hence we need more efficient clustering algorithms for organizing large volume of documents. Clustering on large text dataset can be effectively done using partitional clustering algorithms. The K-means algorithm is the most suitable partitional clustering approach for handling large volume of data. K-means clustering algorithm uses a similarity metric that determines the distance from a document to a point that represents a cluster head. This similarity metric plays a vital role in the process of cluster analysis. The usage of suitable similarity metric improves the clustering results. There are varieties of similarity metrics available to find the similarity between any two documents. In this paper, we analyse the performance and effectiveness of these similarity measures in particular to k-means partitional clustering for text document datasets. We use seven text document datasets and five similarity measures namely Euclidean distance, cosine similarity, Jaccard coefficient, Pearson correlation coefficient and Kullback-Leibler Divergence. Based on our experimental study, we conclude that cosine correlation measure is the best suited similarity metric for K-means clustering algorithm.

Other Details

Paper ID: IJSRDV3I2393
Published in: Volume : 3, Issue : 2
Publication Date: 01/05/2015
Page(s): 408-412

Article Preview

Download Article

Email To A Friend

CALL FOR PAPERS : May-2026

ADVANCED SEARCH

NEWS & UPDATES

FOR AUTHORS

FOR REVIEWERS

ARCHIVES

DOWNLOADS