High Impact Factor : 4.396 icon | Submit Manuscript Online icon |

A Survey on Similarity Aware SQL Based Group-by Operators for Multidimensional Relational Data

Author(s):

Anup Wadhekar , Pune Institute of Computer Technology,Pune; Dr.Emmanuel M., Pune Institute of Computer Technology,Pune

Keywords:

Multi-dimensional data, relational database, similarity operators, group by, tuples

Abstract

The SQL group-by operator plays crucial role in summarizing large datasets in a data analytics. Allowing similarity aware grouping provides a more rational view on real-world data that could lead to better insights. Existing similarity-based grouping operators primarily focus on one-dimensional attributes. However correlated attributes, such as in spatial data, are processed independently. Hence, groups in the multi-dimensional space are not detected properly. To address this problem, two new SGB operators for multi-dimensional data are introduced. The first operator is the clique (or distance-to-all) nSGB, where all the tuples in a group are within some distance from each other. The second operator is the distance-to-any SGB, where a tuple belongs to a group if that tuple is within some distance from any other tuple in the group. Since a tuple may satisfy the membership criterion of multiple groups, we introduce three different semantics to deal with such a case: (i) eliminate the tuple, (ii) put the tuple in any one group, and (iii) create a new group for this tuple. The experimental study, based on TPC -H and a social check-in data proposes that the proposed algorithms can achieve up to three orders of magnitude improvement in performance over baseline methods developed to solve the same problem.

Other Details

Paper ID: IJSRDV5I10194
Published in: Volume : 5, Issue : 1
Publication Date: 01/04/2017
Page(s): 300-303

Article Preview

Download Article