Study and Analysis of Distributed Document Clustering Based on Mapreduce in Hadoop |
Author(s): |
Suman Devi , Manav Rachna International university; Dr. Suresh Kumar, Manav Rachna International University |
Keywords: |
Hadoop; MapReduce, Document Clustering, Distributed Document Clustering, Large Data Sets |
Abstract |
MapReduce is a simplified programming model of distributed parallel computing. It is an important technology of Google, and is commonly used for data-intensive distributed parallel computing. Cluster analysis is the most important data mining methods. Efficient parallel algorithms and frameworks are the key to meeting the scalability and performance requirements entailed in such scientific data analysis. In this paper, we describe how document clustering for large collection can be efficiently implemented with MapReduce. Hadoop implementation provides a convenient and flexible framework for distributed computing on cluster of commodity machines. The design and implementation of direct K-Means and Distributed K-means algorithm on MapReduce is presented. |
Other Details |
Paper ID: IJSRDV3I60210 Published in: Volume : 3, Issue : 6 Publication Date: 01/09/2015 Page(s): 290-293 |
Article Preview |
|
|