Large Scale Data Clustering Using Various-Widths Clustering Approach |
Author(s): |
HARSHAL RUSHIRAJ AGASHE , KKWIEER; SATISH SHANKARRAO BANAIT, KKWIEEER |
Keywords: |
Clustering, k-Nearest Neighbor, Tree Index, large scale data, Map Reduce |
Abstract |
To perform a clustering widely used and most powerful technique is k-nearest neighbor. This approach required large computational cost for high dimensional datasets. The proposed work focuses on k-NN is based on various clustering widths on large scale data. We are proposing modified kNN approach with MapReduce parallel computing algorithm and clusters grouping with goal of improving the performance in terms of clustering time, pre-processing costs and querying cost while working with high dimensional data. First we are presenting the kNN method using various width clustering to efficiently extract the kNNs for input query object from the dataset. The given dataset is clustered using global width then each cluster that satisfies its predefined criteria i.e threshold value is recursively clustered using their local width. To prune unlikely clusters triangle inequality was used earlier, but we designed tree based approach in which centers of clusters grouped into the tree based index to maximize the more clusters pruning. To reduce the processing time and clustering time, we designed parallel computing algorithm based on MapReduce. |
Other Details |
Paper ID: IJSRDV5I10324 Published in: Volume : 5, Issue : 1 Publication Date: 01/04/2017 Page(s): 369-372 |
Article Preview |
|
|