Big Data and Hadoop



What is big Data?

Big data is used to describe a massive volume of both structured and unstructured data that is so large that it's difficult to process using traditional database and software techniques. Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is big data

Why Big Data?

Data growth is huge and all those data is valuable to make critical decisions. Now days, the disk is cheap that we could store the data. But the amount of data is so huge that it won’t fit in a single computer. So we need to have it distributed it across. With the distributed data we will be able to perform parallel operations and thus faster computation. This is the trick behind Hadoop.
Big Data Challenges

  1. Velocity - Lot of data coming at a great speed.
  2.  Volume – Large volume of data is collected and is growing exponentially.
  3. Variety – Data of different varieties gets collected in Hadoop. Data is not organized like we see in relational database. Data may be in the form audio, video, image, files, log files etc.

What is Hadoop?

Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity hardware. Hadoop is not a single software; instead it is a framework of tools and is distributed under apache license. Essentially, it accomplishes two tasks: massive data storage and faster processing.

Traditional Data Storage approach vs Hadoop Storage

Traditionally data is stored in a single computer and the operation on the data will be performed within that. Computer could process the data only up to a threshold data amount. This is a limitation with the traditional data storage approach. Hadoop takes a different approach compared to the traditional data storage. Hadoop breaks the data as well as computation into smaller pieces and thus handling the big data storage and its processing.

No comments:

Post a Comment