WHAT IS HADOOP
WHAT IS HADOOP?
Hadoop is a big buzz in the IT world these days. Apache™ Hadoop® is an open source
software framework for processing huge volumes (BIG DATA) of distributed data (HADOOP DISTRIBUTED FILE SYSTEM) using distributed processing capabilities (MAP REDUCE).
The framework supports distributed processing of large datasets distributed across clusters
of computers (COMMODITY HARDWARE) using simple programming models.
Following are the key features which stands apart HADOOP in the field of large scale
computing with respect to other frameworks
SCALABILITY is a great feature the framework offers, it is designed to scale from few
servers to 1000 of servers. The nodes can be added as needed without impacting the
RELIABILITY The framework offers high degree of fault tolerance. It does not rely on the hardware to provide high availability, instead it has mechanism to detect and handle the failures.
COST EFFICIENT - The framework uses commodity hardware to provide distributed processing thus it results in huge cost saving in terms of dollars spent for processing per terabyte.
FLEXIBILITY - The Hadoop framework does not work on the principles of structured data as done in the relational database and thus can process any type of data whether structured or unstructured
HADOOP framework was developed by Dough Cutting and Mike Cafarella in the year 2005. Dough named HADOOP after his son’s toy elephant.
The base framework of APACHE HADOOP is comprised of the following four components
HADOOP COMMON- The common utilities and libraries required by other HADOOP components
HADOOP FILE DISTRIBUTED SYSTEM- HDFS is a distributed file system for storing large datasets on clusters of commodity hardware
YARN - YET ANOTHER RESOURCE NEGOTIATOR as the name suggests it is the resource manager for HADOOP. It basically managers the resources in clusters and takes care of scheduling of application processing on the clusters
MAP REDUCE - A simple programming model for processing large volumes of data
More about HADOOP
Hadoop provides a solution for storing and analysing the enormous amount of data generated these days by the digital world. So basically it is an answer to the problem BIG DATA poses to the digital world. If you want to know what big data is then read the article on BIG DATA here.
To get a hands on to the hadoop environment you can go through the article where we talk about setting up single node hadoop cluster.
Stay Tuned for some more Hadoop related tutorials ....