A Review Paper on Hadoop


Payal Gothi , Atharva College of Engineering, Mumbai; Riddhi Kamat, Atharva College of Engineering, Mumbai


Big Data, Hadoop, HDFS, MapReduce


The term BIG DATA is used to describe the collection of complex and large data sets such that it is difficult to store, process and analyze this kind of data using conventional database management tools and traditional databases management systems. Earlier RDBMS systems tried to handle this unstructured and semistructred large chunk of data but couldn’t handle the same. So hadoop came into existence. In this paper we present Hadoop and its core concepts which are HDFS and MapReduce. Hadoop is one of the few frameworks that support storing of unstructured data like video files, audio files, image files etc. along with storing the normal structured data. The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. MapReduce in Hadoop is a framework for writing applications that process large amounts of structured and unstructured data in parallel across a cluster of thousands of machines, in a reliable and fault tolerant manner.

Other Details

Paper ID: NCTAAP011
Published in: Conference 4 : NCTAA 2016
Publication Date: 29/01/2016
Page(s): 44-46

Article Preview

Download Article