Progressive Duplicate Detection: A Survey |
Author(s): |
P. Kiruthika , GOBI ARTS & SCIENCE COLLEGE; S. M. Jagatheesan, GOBI ARTS & SCIENCE COLLEGE |
Keywords: |
Blocking, Windowing, Duplicate Detection, Entity Resolution, EPSNM, EPB |
Abstract |
Data mining is also called as knowledge discovery in database (KDD). The concept of data mining include several research field such as statistics, database systems, machine learning concepts etc, and computer processing all have their influence on data mining concepts. The data is the most essential vital asset of any company however, in case the data is changed or a bad data entry is created certain errors like duplicate detection arises. Duplicate detection is that method of identifying the multiple representations of same real world entities. Duplicate detection methods need to process every larger datasets in every shortest period, maintaining the quality of a dataset becomes increasingly difficult. In this paper, presenting the two novels of progressive duplicate detection algorithms such as PSNM and PB that significantly increase the efficiency of detects the duplicates, if the execution time is limited. They maximize the gain of the overall method within the available time by reporting most results are significantly much earlier than traditional approaches. A comprehensive experiment shows that progressive algorithms will double the potency over time of traditional duplicate detection and considerably improve upon related work. This survey discuss about both method such as progressive Sorted Neighborhood method and Progressive Blocking method. |
Other Details |
Paper ID: IJSRDV4I60111 Published in: Volume : 4, Issue : 6 Publication Date: 01/09/2016 Page(s): 172-174 |
Article Preview |
|
|