Deduplication in Big Data |
Author(s): |
| Onkar Jadhav , Trinity College Of Engineering & Research, Pune.; Seema Kamble, Trinity College Of Engineering & Research, Pune.; Dipali Devkar, Trinity College Of Engineering & Research, Pune.; Komal Gadage, Trinity College Of Engineering & Research, Pune.; Prof. Krushnadeo Belerao, Trinity College Of Engineering & Research, Pune. |
Keywords: |
| Hadoop, HDFS, Data Deduplication, MD5 Algorithm, Application Aware Routing Algorithm |
Abstract |
|
Deduplication has become a widely organize tools in cloud data centers to get better IT resources effectiveness. However, traditional techniques face a great challenge in big data deduplication to strike a rational tradeoff between the variance goals of scalable deduplication throughput and high duplicate subvert ratio. We are suggesting a scalable distributed deduplication framework in cloud environment, to meet this challenge, data similarity and locality to optimize spread deduplication with inter-node two-tiered data routing and intra-node application-aware deduplication. It first dispense data at file level, then assigns related data to the same storage node to maintain high global deduplication efficiency, meanwhile balances the workload across nodes. Our experimental evaluation of data Dedupe against state-of-the-art, driven by real-world datasets, demonstrates that Data Dedupe achieve the highest global deduplication efficiency with a higher global deduplication helpfulness than the high-overhead and badly scalable fixed scheme, but at an overhead only somewhat higher than that of the scalable but low duplicate-elimination-ratio approaches. |
Other Details |
|
Paper ID: IJSRDV6I30826 Published in: Volume : 6, Issue : 3 Publication Date: 01/06/2018 Page(s): 1747-1749 |
Article Preview |
|
|
|
|
