Generating better initial Centroids over Hadoop for K-Means Clustering |
Author(s): |
Vineesh Cutting , Sam Higginbottom Institute of Agriculture, Technology and Sciences; Prateek Singh, Sam Higginbottom Institute of Agriculture, Technology and Sciences |
Keywords: |
Data Mining, K-Means clustering, Random initial centroids, Better initial centroids, Hadoop, MapReduce |
Abstract |
Clustering is one of the traditional data mining technique used for grouping of various kinds of data to perform better analyses. K-Means being most desirable Algorithm for clustering. With the advancement in Technology, the data at many domains is generated at higher rates reaching size greater than Petabyte. Harnessing Hadoop and K-Means resulted in faster processing of large data set. However, random initial centroids have to be provided in traditional K-Means algorithm. The Convergence to be reach highly depends on the set of initial centroids. This paper represents an efficient and simplified technique for generating set of better initial centroids as an input to K-Means Clustering over Hadoop. The experimental result shows better performance in clustering compared to random initial centroids. |
Other Details |
Paper ID: IJSRDV4I20359 Published in: Volume : 4, Issue : 2 Publication Date: 01/05/2016 Page(s): 228-231 |
Article Preview |
|
|