High Impact Factor : 4.396 icon | Submit Manuscript Online icon |

Generating better initial Centroids over Hadoop for K-Means Clustering

Author(s):

Vineesh Cutting , Sam Higginbottom Institute of Agriculture, Technology and Sciences; Prateek Singh, Sam Higginbottom Institute of Agriculture, Technology and Sciences

Keywords:

Data Mining, K-Means clustering, Random initial centroids, Better initial centroids, Hadoop, MapReduce

Abstract

Clustering is one of the traditional data mining technique used for grouping of various kinds of data to perform better analyses. K-Means being most desirable Algorithm for clustering. With the advancement in Technology, the data at many domains is generated at higher rates reaching size greater than Petabyte. Harnessing Hadoop and K-Means resulted in faster processing of large data set. However, random initial centroids have to be provided in traditional K-Means algorithm. The Convergence to be reach highly depends on the set of initial centroids. This paper represents an efficient and simplified technique for generating set of better initial centroids as an input to K-Means Clustering over Hadoop. The experimental result shows better performance in clustering compared to random initial centroids.

Other Details

Paper ID: IJSRDV4I20359
Published in: Volume : 4, Issue : 2
Publication Date: 01/05/2016
Page(s): 228-231

Article Preview

Download Article