High Impact Factor : 4.396 icon | Submit Manuscript Online icon |

An Efficient Data Cleansing by Duplication Record Detection Algorithm

Author(s):

R Janardhan Naidu , KMM Institute of PG Studies; Ms. I. Madhavi Latha, KMM Institute of PG Studies

Keywords:

Duplication Records Detection Algorithm, DCS++, Windowing, Blocking

Abstract

Many industries and businesses have huge amount of data stored in different databases. In this fast world, it is necessary that data operations on the database are carried out smoothly and efficiently. However, to access the useful information that can help in decision making for industries and businesses, it is necessary to integrate large dataset. In existing system, DCS++ algorithm is used. It is very difficult to analyzed or understand. To improve this limitation and improve their performance we are use proposed system. In this proposed system, we are use Record Detection Algorithm. In record detection algorithms are classified into three types. They are knowledge based techniques, probabilistic techniques, and empirical techniques. Knowledge based algorithms demand training and the use of that training and reasoning skills in order to perform detection. Probabilistic algorithms are based on geometric and probability methods that are Bayesian networks, anticipation maximization and data clustering. Empirical algorithms consist on sorting, blocking and windowing methods. Here mainly blocking and windowing is used. By this accuracy are increases and efficiency and the performance of this system will be increased?

Other Details

Paper ID: IJSRDV7I10975
Published in: Volume : 7, Issue : 1
Publication Date: 01/04/2019
Page(s): 1381-1384

Article Preview

Download Article