A Survey on Similarity Aware SQL Based Group-by Operators for Multidimensional Relational Data |
Author(s): |
| Anup Wadhekar , Pune Institute of Computer Technology,Pune; Dr.Emmanuel M., Pune Institute of Computer Technology,Pune |
Keywords: |
| Multi-dimensional data, relational database, similarity operators, group by, tuples |
Abstract |
|
The SQL group-by operator plays crucial role in summarizing large datasets in a data analytics. Allowing similarity aware grouping provides a more rational view on real-world data that could lead to better insights. Existing similarity-based grouping operators primarily focus on one-dimensional attributes. However correlated attributes, such as in spatial data, are processed independently. Hence, groups in the multi-dimensional space are not detected properly. To address this problem, two new SGB operators for multi-dimensional data are introduced. The first operator is the clique (or distance-to-all) nSGB, where all the tuples in a group are within some distance from each other. The second operator is the distance-to-any SGB, where a tuple belongs to a group if that tuple is within some distance from any other tuple in the group. Since a tuple may satisfy the membership criterion of multiple groups, we introduce three different semantics to deal with such a case: (i) eliminate the tuple, (ii) put the tuple in any one group, and (iii) create a new group for this tuple. The experimental study, based on TPC -H and a social check-in data proposes that the proposed algorithms can achieve up to three orders of magnitude improvement in performance over baseline methods developed to solve the same problem. |
Other Details |
|
Paper ID: IJSRDV5I10194 Published in: Volume : 5, Issue : 1 Publication Date: 01/04/2017 Page(s): 300-303 |
Article Preview |
|
|
|
|
