HDFS – Hadoop distributed file system is foundation for Hadoop. It is logical file system. As part of this lesson we will understand all the necessary details of HDFS.
Here is the video which will talk about introduction of HDFS.
Virtual Machines – CDH5 – Setup HDFS
Here is the video which explains how HDFS is setup using Cloudera Manager. This is to just give an idea how HDFS is setup using 3rd party vendors such as Cloudera. Developers need not worry about nuances of setting up Hadoop cluster.
HDFS – Files and blocks – dfs.blocksize (Block Size)
In this video we will see how data is stored in HDFS. Building blocks of HDFS are
- Default block size is 128 MB
HDFS is logical file system. A file which is represented by name will be physically divided into blocks and stored in multiple servers in the cluster. Each block will have unique id associated with it. Block id is unique across all the blocks related to all the files in HDFS.
Normal operation of Hadoop cluster in data storage and processing
This video continue to discuss how normal operation of HDFS is done. This is also to give idea about how HDFS is managed internally.
HDFS – Replication Factor – Fault Tolerance
This video covers fault tolerance in HDFS.
- Replication factor is building block in HDFS which drives fault tolerance
- Default replication factor is 3
- There will be 3 copies created for each of the block
- All 3 copies will be stored in multiple nodes in the cluster.
HDFS – Metadata, Datanode, Namenode and Secondary Namenode
This video cover details about metadata and how it is managed in namenode. Also it covers the role of secondary namenode.
HDFS – Heartbeat, Block report and Checksum
This video covers how namenode and datanode work together to make sure all HDFS daemons are running.
HDFS – Namenode Recovery
This video covers how namenode recovery is done using fsimage, editlogs. Also it covers the role of secondary namenode.
- All namenode transactions (changes to metadata) will be logged in editlogs
- fsimage is created at regular intervals to have point in time snapshot of namenode metadata
- Creating fsimage at regular intervals is called checkpointing.
- Latest edit logs will be merged to previous fsimage to create latest fsimage
- To recover namenode, latest fsimage will be restored and then last edit logs will be applied. It is done in safe mode.
- Checkpointing is done by secondary namenode
- This topic is also not very important with respect to development.