High Impact Factor : 4.396 icon | Submit Manuscript Online icon |

Erasure Coding Technique for Data Replication in HDFS

Author(s):

P. Haritha , KMM Institute Of PG Studies; Mrs. C. Hemavathy, KMM Institute Of PG Studies

Keywords:

Big Data, Hadoop Distributed File System, Dynamic Data Replication

Abstract

The Hadoop Distributed File System (HDFS) component of Apache Hadoop helps in distributed storage of big data with a cluster of commodity hardware. HDFS ensures availability of data by replicating data to different nodes. However, the replication policy of HDFS does not consider the popularity of data. The popularity of the files tends to change over time. Hence, maintaining a fixed replication factor will affect the storage efficiency of HDFS. In this paper we propose an efficient dynamic data replication management system, which consider the popularity of files stored in HDFS before replication. This strategy dynamically classifies the files to hot data or cold data based on its popularity and increases the replica of hot data by applying erasure coding for cold data. The experiment results show that the proposed method effectively reduces the storage utilization up to 40% without affecting the availability and fault tolerance in HDFS.

Other Details

Paper ID: IJSRDV7I11071
Published in: Volume : 7, Issue : 1
Publication Date: 01/04/2019
Page(s): 1525-1528

Article Preview

Download Article