High Impact Factor : 4.396 icon | Submit Manuscript Online icon |

Overcoming the Defects of K-Means Clustering by using Canopy Clustering Algorithm


Ambika S , Sapthagiri College of Engineering; Kavitha G, Sapthagiri College of Engineering


High Dimensional Dataset, Data Mining, Synthetic Sampling, Parameter Estimator, K-Means Clustering Algorithm, Canopy clustering Algorithm


High dimension data clustering is the study of data that contains hundreds of dimensions. To improve the processing time of K-means clustering algorithm on high dimensional dataset by making use of canopy clustering algorithm. A canopy clustering algorithm uses the synthetic sampling method as the preprocessing step, as well as it uses the created T1 & T2 parameter values to create canopies and also provides initial cluster centers. Existing clustering algorithm normally works with the small dataset and it doesn’t works with the high dimensional dataset because the algorithm may yields the inaccurate clusters by selecting the random cluster centers, and another problem is the number of required cluster or k-values are predefined by the user. The proposed algorithm works well with the high dimensional dataset and it over comes the limitations of the K-means clustering algorithm and minimizes the execution time of the existing algorithm.

Other Details

Paper ID: IJSRDV4I50277
Published in: Volume : 4, Issue : 5
Publication Date: 01/08/2016
Page(s): 613-615

Article Preview

Download Article