Clustering Of Big Data Using a Novel Method in K-Means and K-Medoids Clustering Algorithms |
Author(s): |
| Subhashree K , Kongu Enginnering College; P.S.Prakash, Kongu Engineering College |
Keywords: |
| Hadoop, MapReduce |
Abstract |
|
Large amounts of structured and unstructured data are being collected from various sources for several years. These huge amounts of data called Big Data which are difficult to handle by a single machine require the work to be distributed across large number of computers. Hadoop is a distributed framework which uses MapReduce programming model to process the data in a distributed manner. Clustering analysis is one of the most important research areas in the field of data mining. Clustering is the most commonly used data processing algorithms. Clustering is a division of data into different groups. Data are grouped in such a way that data of the same group are similar and the data in other groups are dissimilar. Clustering aims in minimizing intra-class similarity and in maximizing interclass dissimilarity. k-Means is the popular clustering algorithm because of its simplicity. Nowadays, as the volume of data increases, researchers started to use MapReduce which is a parallel processing framework to get high performance. But, MapReduce is unsuitable for iterated algorithms owing to repeated times of restarting jobs, big data reading and shuffling. To overcome this problem, a novel processing model in MapReduce called optimized k-means clustering method which uses the methods of probability sampling and clustering, merging using two algorithms called weight based merge clustering and distribution based merge clustering is introduced to eliminate the iteration dependence and obtain high performance. Also the algorithms are compared with the k-medoids clustering algorithms |
Other Details |
|
Paper ID: IJSRDV3I2725 Published in: Volume : 3, Issue : 2 Publication Date: 01/05/2015 Page(s): 1227-1230 |
Article Preview |
|
|
|
|
