High Impact Factor : 4.396 icon | Submit Manuscript Online icon |

Hubness Based Clustering Algorithms For High-Dimensional Data

Author(s):

Pradeepa S , Kongu Engineering college; Dr.R. Thamilselvan, Kongu Engineering College

Keywords:

Clustering,high-dimensional data, curse of dimensionality, nearest neighbors, hubness

Abstract

Clustering is an unsupervised process of grouping elements together, so that elements assigned to each clusters are more similar to each other than to the other data points. Clustering becomes difficult due to increasing sparsity of such data as well as the increasing difficulty in distinguishing distances between data points. Also, most data of interest today in data-mining applications is complex and is usually represented by many different features.Traditional approaches for clustering in low dimensional data can also be used for clustering high dimensional data by observing a lower dimensional feature subspace. But the performance of standard machine learning algorithms becomes degraded while handling high dimensional data. The number of data points are required to represent any distribution grows exponentially with number of dimensions which leads to bad density estimate for higher dimensional data. The difficulties in dealing with high dimensional data are considered to be an aspect of the curse of dimensionality. In this existing system the value of k or the range of k is neither directly nor indirectly specified by the users.The proposed method describes a novel perspective on the problem of clustering high-dimensional data and also for specifying k value by using Visual Access Tendency (VAT). Instead of attempting to avoid the curse of dimensionality by observing a lower dimensional feature subspace, the proposed method embrace the dimensionality by high-dimensional phenomena. Hubness, the number of times a data point appears among the k nearest neighbors of other data points in a data set, can be successfully exploited in clustering. Hubness score can be used as a good measure of point centrality within a high-dimensional data cluster by using hubness based clustering algorithms. In addition to that the proposed methodology uses VAT to find the number of clusters the high dimensional data with more accuracy in the automated manner. The experimental tests conducted were proves that the proposed methodology provides better result than the existing approaches in terms of more accuracy and improved time consumption The quality of clusters is measured in terms of silhouette index and the results obtained would be promising

Other Details

Paper ID: IJSRDV3I2940
Published in: Volume : 3, Issue : 2
Publication Date: 01/05/2015
Page(s): 1522-1528

Article Preview

Download Article