High Impact Factor : 4.396 icon | Submit Manuscript Online icon |

Deduplication in Big Data

Author(s):

Onkar Jadhav , Trinity College Of Engineering & Research, Pune.; Seema Kamble, Trinity College Of Engineering & Research, Pune.; Dipali Devkar, Trinity College Of Engineering & Research, Pune.; Komal Gadage, Trinity College Of Engineering & Research, Pune.; Prof. Krushnadeo Belerao, Trinity College Of Engineering & Research, Pune.

Keywords:

Hadoop, HDFS, Data Deduplication, MD5 Algorithm, Application Aware Routing Algorithm

Abstract

Deduplication has become a widely organize tools in cloud data centers to get better IT resources effectiveness. However, traditional techniques face a great challenge in big data deduplication to strike a rational tradeoff between the variance goals of scalable deduplication throughput and high duplicate subvert ratio. We are suggesting a scalable distributed deduplication framework in cloud environment, to meet this challenge, data similarity and locality to optimize spread deduplication with inter-node two-tiered data routing and intra-node application-aware deduplication. It first dispense data at file level, then assigns related data to the same storage node to maintain high global deduplication efficiency, meanwhile balances the workload across nodes. Our experimental evaluation of data Dedupe against state-of-the-art, driven by real-world datasets, demonstrates that Data Dedupe achieve the highest global deduplication efficiency with a higher global deduplication helpfulness than the high-overhead and badly scalable fixed scheme, but at an overhead only somewhat higher than that of the scalable but low duplicate-elimination-ratio approaches.

Other Details

Paper ID: IJSRDV6I30826
Published in: Volume : 6, Issue : 3
Publication Date: 01/06/2018
Page(s): 1747-1749

Article Preview

Download Article