HIGH IMPACT FACTOR - 2.39

Feature Subset Selection for High Dimensional Data Based On Clustering

Author(s):

Prof. Sarika Zaware , All India Shri Shivaji Memorial Society's Institute Of Information Technology; Asmita Orpe, All India Shri Shivaji Memorial Society's Institute Of Information Technology; Heena Shaikh, All India Shri Shivaji Memorial Society's Institute Of Information Technology; Pooja Rokade, All India Shri Shivaji Memorial Society's Institute Of Information Technology; Sheefa Shaikh, All India Shri Shivaji Memorial Society's Institute Of Information Technology

Keywords:

Bayesian Probability, Fuzzy Logic, Gaussian Distribution, Markov Blanket, MST Creation, Shannon Info gain

Abstract

Feature selection is the process of examining, evaluating and extracting required data which can be clustered into subsets which contain and retain the integrity of original data. A feature selection algorithm should be adept and productive. Adept means minimum time required and productive means quality of generated subset is not compromised. Our system proposes an algorithm which consists of following steps: Markov Blanket, Shannon Info gain, Minimum Spanning Tree, Tree Partition, Gaussian distribution, Bayesian Probability. Applying these steps we get the desired subset from the clusters. Our system ensures to remove irrelevant data along with redundant data which most of the systems fail to eliminate. Irrelevant features are the extraneous features or data objects, whereas the redundant ones are the repetitious features. These data objects tend to consume memory and do not contribute in generating accurate results.

Other Details

Paper ID: IJSRDV4I30121
Published in: Volume : 4, Issue : 3
Publication Date: 01/06/2016
Page(s): 298-301

Article Preview

Download Article