Data Mining of Log Files using Self-Organizing Map and Bisecting K-means Clustering Methods through Hadoop: A Review |
Author(s): |
Anita Choudhary , Mody University of Science and Technology; Mrs. Priyanka Dahiya, Mody University of Science and Technology |
Keywords: |
Big Data, Hadoop; Web Log file; Data Stream; Bisecting K-means clustering; SOM and U-matrix, E-commerce |
Abstract |
The continuous increase of computational strength has produced massive flow of data in past two decades. Big data is a data which cannot be processed and analysed by traditional techniques. It’s not only used for store and handle large volumes of data but also to analyse and extract accurate information from the data in small amount of time. Today’s internet world, data rapidly increases so analyse and storage becomes impossible and this also increases processing time and cost efficiency. In distributed computing various techniques and algorithm are used but problem remains still idle. To solve this problem Hadoop is used to process the files in parallel manner. E-commerce websites using log files analysing task to identify their user behaviour to improve their business. Large E-commerce websites like flipkart.com, amazon.in and e-bay.in millions of customers are visiting this sites simultaneously. As a result, these customers generate large amount of data in their log file entries. To analyse this large amount of log files entries we require parallel processing and reliable data storage system. In this paper, we present the Hadoop, bisected k-mean and SOM (Self-Organizing Map). Hadoop provides Hadoop distributed file system and MapReduce programming model to process huge amount of data in efficient and effective manner. Bisecting k-mean is used to analyse the existing offline data stream. Last method SOM is used to mine offline data streams using visualization tool like U-matrix methods. |
Other Details |
Paper ID: IJSRDV4I11172 Published in: Volume : 4, Issue : 1 Publication Date: 01/04/2016 Page(s): 1598-1602 |
Article Preview |
|
|