What is BIG Data | Hadoop and their architecture

In this tutorial,Firstly , you will learn what is BigData? and then What is HDFS? at last you will cover Hadoop and their Architecture.

What is BIG DATA?

Where Big data is used?

One live example like when you start searching any product on Amazon then it provides recommended data and similar products as per your searching criteria.

BigData cluster-

Machines are commodity hardware (CPU+RAM) and these are stacked together on a rack. These racks are installed in physical location called as Data_centers.

Big data pipelines- There are some steps that are- 1-Big data ingestion –(Sqoop/Flume) The data is coming from different and multiple sources. 2-Data validation and cleanup & processing (Spark) In this phase, we validate and cleanup our data and process the data. 3-Data analysis (Hive) In this phase, we do some data analysis as per business requirement. 4-Data visualization (Tableau) We can create report that helps to communicate information clearly to users.

What is HDFS?

1-Primary data_storage

HDFS is primary data_storage system under hadoop applications.

2-Distributed File system

When we use distributed file system?

When data becomes large enough to accommodate on a single machine it becomes necessary to break it and distribute on multiple machines.

3-Block size (128MB)

HDFS stores every file as a block.

The default size of a block in HDFS is 128MB.

4-Fault Tolerant

It also replicates (creates exact copy of) those blocks to provide the fault tolerance in case of failures.

The default Replicator Factor is 3.

For Example-

You have 1GB of data.The block size in HDFS is 128MB then it creates 8 blocks.

1GB=128MB1024/128=8 blocksReplicator Factor: 3 (creates  exact 3 copies)

Block size and Replicator Factor default provided by hadoop.

You can change block size and replicator factor as per your convenience.

Architecture in Hadoop-

1-Name node– stores meta information.

It knows which block of file goes to which machine.

Name node is responsible for dividing the file and storing all meta information.

2-Data node-

This node stores all data related information.

We have one name node in cluster that act as master node and several data nodes that act as slave nodes.

read more

Application developer|Technical blogger|Youtuber(MASTERMIND CODER)|owner at www.yourtechnicalteacher.com