Map Reduce based Analysis of Live Website Traffic integrated with improved Performance for Small files using Hadoop |
Author(s): |
Vaibhavi Shekar , New Horizon College of Engineering; Sushmitha R, New Horizon College of Engineering |
Keywords: |
Log Files, Small Files, Hadoop, Hadoop Distributed File System (HDFS), MapReduce, Google Visualization |
Abstract |
Hadoop is an open source java framework that deals with big data. It has mainly two core components: HDFS (Hadoop distributed file system) stores large amount of data in a reliable manner and another is MapReduce which is a programming model processes the data in parallel and distributed fashion. Hadoop does not perform well for small files as a large number of small files puts a heavy burden on the Name Node of HDFS and results an increase in execution time for MapReduce. Hadoop is designed to handle large size files and hence suffers a performance penalty while dealing with large number of small files. This research paper gives an introduction about HDFS, small file problem and existing ways to deal with it along with proposed approach to handle small files. In the proposed approach, MapReduce programming model is used for merging small files on Hadoop. This approach improves the performance of Hadoop in handling small files by ignoring the files which have a size that is larger than the block size of Hadoop and also reducing the memory required by Name Node to store them. We also propose a Traffic analyzer with the combination of Hadoop and Map-Reduce paradigm. The combination of Hadoop and MapReduce programming tools makes it possible to provide a batch analysis in a minimum response time and memory computing capacity in order to process log in a highly available, efficient and stable way. |
Other Details |
Paper ID: IJSRDV4I11007 Published in: Volume : 4, Issue : 1 Publication Date: 01/04/2016 Page(s): 1496-1498 |
Article Preview |
|
|