What
is big Data?
Big data
is used to describe a massive volume of both structured and unstructured data
that is so large that it's difficult to process using traditional database and software
techniques. Every day, we create 2.5 quintillion bytes of data — so much that
90% of the data in the world today has been created in the last two years
alone. This data comes from everywhere: sensors used to gather climate information,
posts to social media sites, digital pictures and videos, purchase transaction
records, and cell phone GPS signals to name a few. This data is big
data.
Why
Big Data?
Data
growth is huge and all those data is valuable to make critical decisions. Now
days, the disk is cheap that we could store the data. But the amount of data is
so huge that it won’t fit in a single computer. So we need to have it
distributed it across. With the distributed data we will be able to perform
parallel operations and thus faster computation. This is the trick behind Hadoop.
Big
Data Challenges
- Velocity - Lot of data coming at a great speed.
- Volume – Large volume of data is collected and is growing exponentially.
- Variety – Data of different varieties gets collected in Hadoop. Data is not organized like we see in relational database. Data may be in the form audio, video, image, files, log files etc.
What
is Hadoop?
Hadoop
is an open-source software framework for storing and processing big data
in a distributed fashion on large clusters of commodity hardware. Hadoop is not
a single software; instead it is a framework of tools and is distributed under
apache license. Essentially, it accomplishes two tasks: massive data storage
and faster processing.
Traditional
Data Storage approach vs Hadoop Storage
Traditionally
data is stored in a single computer and the operation on the data will be
performed within that. Computer could process the data only up to a threshold
data amount. This is a limitation with the traditional data storage approach.
Hadoop takes a different approach compared to the traditional data storage.
Hadoop breaks the data as well as computation into smaller pieces and thus
handling the big data storage and its processing.
No comments:
Post a Comment