Unsupervised Outlier Detection in Categorical Dataset |
Author(s): |
| Varalakshmi P , MAR ATAHANASIUS COLLEGE OF ENGINEERING; Linda Sara Mathew, MAR ATHANASIUS COLLEGE OF ENGINEERING |
Keywords: |
| Optimization model, Holoentropy, Categorical data |
Abstract |
|
Outlier detection for categorical data sets is an important problem. It is difficult to define a similarity measure for categorical data. A formal definition of outliers and an optimization model of outlier detection, through a new concept of holoentropy that takes both entropy and total correlation are taken into consideration. Based on this model, define a function for the outlier factor of an object which is solely determined by the object itself and can be updated efficiently. The unsupervised outlier detection approach detects outliers in an unlabeled data set under the assumption that the majority of the objects in the data set are normal. Two 1-parameter outlier detection methods, named ITB-SP (Information Theory Based Single Pass) and ITB SS(Information Theory Based Step by Step), which require no user-defined parameters for deciding whether an object is an outlier is used. Algorithm implements weighted attributes and holoentropy that considers both the data distribution and attribute correlation to measure the similarity of outlier candidates in data sets. An upper bound for the number of outliers and an anomaly set of candidate set are used which allows reducing the search cost. |
Other Details |
|
Paper ID: IJSRDV3I120701 Published in: Volume : 3, Issue : 12 Publication Date: 01/03/2016 Page(s): 930-933 |
Article Preview |
|
|
|
|
