Hadoop Framework and Uses and its Challenges

Document Type : Original Article

Abstract

In recent years, an ever-increasing trend in mass data production is observed over the recent years. According to IBM, interestingly, around 90% of the existing data in the world is produced only in the last two years. It was in 2007, when the size of data exceeded the available storage resource for the first time. Also a wide range of applications such as search engines, medical research, weather forecasting and scientific programs needed distributed data for the processing and analysis ofbig amounts of data , Big Data, as in other technologies, has numerous opportunities and challenges in front users. The use of opportunities and benefits in the business and proper management challenges is converted into one of the hot topics in the field of IT, So there is a very important mechanism for processing mass at a cost effective, Therefore, one of the best ways to solve the problem of massive information processing is the use of the Apache Hadoop. Gartner's definition of the Hadoop is “Hadoop is a data management system that brings together large volumes of structured and unstructured data that affects almost all organizational layers. this causes the positioning in the heart of data centers”. Hadoop is part of the Apache Software Foundation supported byApache projects , in fact Hadoop is a free Java-based programming framework that allows us to process massive sets of data in a distributed processing environment supports. Therefore, in this article, we have a comparison of structured and unstructured Database and then, we investigate the Apache Hadoop architecture and its wide range of applications in today's Big Data as well as  challenges facing this emerging technology, such as batch processing, real-time processes and bottlenecks.

Keywords


  1.  

    1. Gurusamy, Vairaprakash, S. Kannan, and K. Nandhini. "A Study on Distributed Computing Framework: Hadoop, Spark and Storm." (2018).##
    2. S. Blazhievsky and W. Nice, “Introduction to Hadoop and MapReduce,” SNIA Education All Right Reserved Storage Networking Industry Association, 2013.##

    3. K. Grolinger, M. Hayes, A. Higashino, A. L'Heureux, and Allison, “Challenges for Map Reduce and Hadoop in Big Data,” Department of Electrical and Computer Engineering Western University, London, IEEE 2014.##

    4.Ch. Wong Lee, S. Hong Cho, J. Wook Kim, and D. HoonHwang, “Development of electric trading system using big data,” International Journal of Multimedia and Ubiquitous Engineering, vol .9, 2014.##

    5.H. wardhan, Bh. Devendra, and P. Gadekar, “A Review Paper on Big Data and Hadoop,” International Journal of Scientific and Research Publications, vol. 4, Issue 10, October 2014.##

    6.A. Madaan, et al., “Hadoop: Solution to Unstructured Data Handling,” Big Data Analytics, Springer, Singapore, 2018.##

    7. J. Schnase, D. Duffy, S. Strong, D. Nadeau, and H. Thompson, “Applying Apache Hadoop to NASA’s Big Climate Data,” National Aeronautics and Space Administration, 2014.##

    8.R. R. Parmar, et al., “Large-Scale Encryption in the Hadoop Environment: Challenges and Solutions,” IEEE Access 5, pp. 7156-7163, 2017.##

    9. S. Sakr, A. Liu, D. M. Batista, and M. Alomari, “A survey of large scale data management approaches in cloud environments,” Communications Surveys & Tutorials, IEEE, vol. 13, pp. 311-336, Jaiswal, Er Shalika, and Amandeep Singh Walia, “Big Data and Hadoop challenges and issues,” International Journal 8.4, 2017.##

    10.J. Anuradha, “A brief introduction on Big Data 5Vs characteristics and Hadoop technology,” Procedia computer science 48, pp. 319-324, 2015.##

    11.http://www.bigdatacompanies.com/top-5-hadoop-distributions-for-big-data/##

    12.http://cakesolutions.net/teamblogs/comparison-of-apache-stream-processing-frameworks-part-1##

    13.J. E. Shalika and E. A. Singh Walia, “Big Data and Hadoop challenges and issues,” International Journal, vol. 8, no. 4, 2017.##