High Impact Factor : 4.396 icon | Submit Manuscript Online icon |

Large Scale Data Clustering Using Various-Widths Clustering Approach

Author(s):

HARSHAL RUSHIRAJ AGASHE , KKWIEER; SATISH SHANKARRAO BANAIT, KKWIEEER

Keywords:

Clustering, k-Nearest Neighbor, Tree Index, large scale data, Map Reduce

Abstract

To perform a clustering widely used and most powerful technique is k-nearest neighbor. This approach required large computational cost for high dimensional datasets. The proposed work focuses on k-NN is based on various clustering widths on large scale data. We are proposing modified kNN approach with MapReduce parallel computing algorithm and clusters grouping with goal of improving the performance in terms of clustering time, pre-processing costs and querying cost while working with high dimensional data. First we are presenting the kNN method using various width clustering to efficiently extract the kNNs for input query object from the dataset. The given dataset is clustered using global width then each cluster that satisfies its predefined criteria i.e threshold value is recursively clustered using their local width. To prune unlikely clusters triangle inequality was used earlier, but we designed tree based approach in which centers of clusters grouped into the tree based index to maximize the more clusters pruning. To reduce the processing time and clustering time, we designed parallel computing algorithm based on MapReduce.

Other Details

Paper ID: IJSRDV5I10324
Published in: Volume : 5, Issue : 1
Publication Date: 01/04/2017
Page(s): 369-372

Article Preview

Download Article