High Impact Factor : 4.396 icon | Submit Manuscript Online icon |

Data Mining of Log Files using Self-Organizing Map and Bisecting K-means Clustering Methods through Hadoop: A Review

Author(s):

Anita Choudhary , Mody University of Science and Technology; Mrs. Priyanka Dahiya, Mody University of Science and Technology

Keywords:

Big Data, Hadoop; Web Log file; Data Stream; Bisecting K-means clustering; SOM and U-matrix, E-commerce

Abstract

The continuous increase of computational strength has produced massive flow of data in past two decades. Big data is a data which cannot be processed and analysed by traditional techniques. It’s not only used for store and handle large volumes of data but also to analyse and extract accurate information from the data in small amount of time. Today’s internet world, data rapidly increases so analyse and storage becomes impossible and this also increases processing time and cost efficiency. In distributed computing various techniques and algorithm are used but problem remains still idle. To solve this problem Hadoop is used to process the files in parallel manner. E-commerce websites using log files analysing task to identify their user behaviour to improve their business. Large E-commerce websites like flipkart.com, amazon.in and e-bay.in millions of customers are visiting this sites simultaneously. As a result, these customers generate large amount of data in their log file entries. To analyse this large amount of log files entries we require parallel processing and reliable data storage system. In this paper, we present the Hadoop, bisected k-mean and SOM (Self-Organizing Map). Hadoop provides Hadoop distributed file system and MapReduce programming model to process huge amount of data in efficient and effective manner. Bisecting k-mean is used to analyse the existing offline data stream. Last method SOM is used to mine offline data streams using visualization tool like U-matrix methods.

Other Details

Paper ID: IJSRDV4I11172
Published in: Volume : 4, Issue : 1
Publication Date: 01/04/2016
Page(s): 1598-1602

Article Preview

Download Article