High Impact Factor : 4.396 icon | Submit Manuscript Online icon |

Mining Outliers in Large Data Sets

Author(s):

S.Gayathri , Selvam College of Technology; P.Sasikumar, Selvam College of Technology

Keywords:

Outlier mining is one of the important processes in the data mining technique. It is the process of extracting or mining the irrelevant values in the large dataset. Due to this we will have the noise/ error free values in the dataset. The proposed method for outlier detection using the distance based method. We estimate the outlier based on the distance between the objects in the dataset. Distance has been calculated such as standard deviation and average. After top n outlier values has been calculated. Based on the distance and weight values, we estimate threshold value. The aim of outlier detection is based on the threshold value. If the threshold value is greater than the overall distance value then we denote it as normal value otherwise it is outlier value. It is estimated by using the algorithms such as Distributed solving set algorithm and Lazy distributed solving set algorithm. We can improve the communication cost and overall runtime by using these algorithms.

Abstract

Outlier mining is one of the important processes in the data mining technique. It is the process of extracting or mining the irrelevant values in the large dataset. Due to this we will have the noise/ error free values in the dataset. The proposed method for outlier detection using the distance based method. We estimate the outlier based on the distance between the objects in the dataset. Distance has been calculated such as standard deviation and average. After top n outlier values has been calculated. Based on the distance and weight values, we estimate threshold value. The aim of outlier detection is based on the threshold value. If the threshold value is greater than the overall distance value then we denote it as normal value otherwise it is outlier value. It is estimated by using the algorithms such as Distributed solving set algorithm and Lazy distributed solving set algorithm. We can improve the communication cost and overall runtime by using these algorithms.

Other Details

Paper ID: IJSRDV2I1179
Published in: Volume : 2, Issue : 1
Publication Date: 01/04/2014
Page(s): 406-407

Article Preview

Download Article