High Impact Factor : 4.396 icon | Submit Manuscript Online icon |

An Innovative Approach for Storing and Accessing Small Files in Hadoop Distributed File System

Author(s):

K.Kiruthika , Kongu Engineering College; E. Gothai, Kongu Engineering College

Keywords:

AVRO; Hadoop; HDFS; Map Reduce; Small File Problem; File Merging

Abstract

In the recent years, the use of internet get increases, so all user wish to store data on cloud computing platform. Most of the time user’s files are small in size, so it leads to small files. Hadoop is developed as a software structure for distributed processing of large datasets across large clusters of computers. Hadoop structure consists of Hadoop Distributed file system (HDFS) and Execution engine called Map Reduce layers. HDFS has the property of handling very large datasets whose sizes are in Megabytes, Gigabytes and Terabytes, but the performance of Hadoop Distributed file system degrades when handling large amount of small size files. The massive numbers of small files impose heavy burden to the NameNode of HDFS, correlations between small files were not considered for data storage. In this paper, an efficient approach is designed to improve the storage and access efficiencies of small files, small files are classified based on file correlation features and File merging and prefetching technique is applied for structurally-related small files. AVRO technique is applied to further improve the storage and access efficiency of small files. Experimental procedures show that the proposed technique effectively improves the storage and access efficiency of small files when compared with the original HDFS.

Other Details

Paper ID: IJSRDV3I21063
Published in: Volume : 3, Issue : 2
Publication Date: 01/05/2015
Page(s): 1997-2001

Article Preview

Download Article