
Your go-to-cheatsheet to learn apache-Hadoop.

Primary LanguageShellMIT LicenseMIT

Hadoop-CheatSheet šŸ˜

A cheatsheet to get you started with Hadoop

But the question is why should we learn Hadoop? How will it make our life easier?

Read till the end to know more.

Happy learning šŸ‘©ā€šŸŽ“

Index Of Contents

  1. Introduction
  2. Installation
  3. Configuration
    i) NameNode
    ii) DataNode
    iii) ClientNode
  4. GUI
  5. Frequently Asked Questions
  6. Testing
  7. Contributing
    i)Contribution Practices
    ii)Pull Request Process
    iii)Branch Policy
  8. Cool Links to Check out
  9. License
  10. Contact


Simple answer to the the above question is to store data. Again the question, when there is Database as well as Drive storage why should we use Hadoop?


Now the question, What is Big Data? An example of big data might be petabytes (1,024 terabytes) or exabytes (1,024 petabytes) of data consisting of billions to trillions of records of millions of peopleā€”all from different sources (e.g. Web, sales, customer contact center, social media, mobile data and so on).

To store so much of data we use the concept of DISTRIBUTED STORAGE CLUSTER. To implement these concepts we use Apache Hadoop.


(For 1 master and multi slave and multi client nodes) For Master,Slave and Client Nodes

This is for RedHat
    - Install Java JDK as Hadoop depends on it
        wget https://www.oracle.com/webapps/redirect/signon?nexturl=https://download.oracle.com/otn/java/jdk/8u171-b11/512cd62ec5174c3487ac17c61aaa89e8/jdk-8u171-linux-x64.rpm
        rpm -i -v -h jdk-8u171-linux-x64.rpm
    - Install apache hadoop
        wget https://archive.apache.org/dist/hadoop/core/hadoop-1.2.1/hadoop-1.2.1-1.x86_64.rpm
        rpm -i -v -h hadoop-1.2.1-1.x86_64.rpm --force
    - Verify if it is correctly installed with
        java -version
        hadoop version

Preview Image



(NameNode is also called Master Node)

    mkdir /nn
    vim /etc/hadoop/core-site.xml

    vim /etc/hadoop/hdfs-site.xml

The configured files: Logo #Check if the port number you assigned is free, if not then change the port number in the core-site.xml

Then we will have to format the /nn folder of the namenode. hadoop namenode -format

    netstat -tnlp 

We see that the process has not yet started and the assigned port is free


Then we will have to start the service:

hadoop-daemon.sh start namenode
netstat -tnlp

We see that the process has started and the port is assigned Logo

To view the no of slave nodes connected hadoop dfsadmin -report



(DataNode is also called Slave Node)

    vim /etc/hadoop/core-site.xml
    mkdir /dn1
    vim /etc/hadoop/hdfs-site.xml

The Configured files: Logo

Then we will have to start the service Make sure that if you doing the setup locally using VM's , then the firewall should be stopped in the master node. To check so:

    systemctl status firewalld
   - If it is active then stop or disable(if you don't want to start after system reboot)
        systemctl stop firewalld
        systemctl disable firewalld


hadoop-daemon.sh start datanode

We see that the process has started. Logo

To view the no of slave nodes connected

hadoop dfsadmin -report Logo


    vim /etc/hadoop/core-site.xml

    - To see how many files we have in their storage
        hadoop fs -ls /
    - To add a file
        cat > /file1.txt
        Hi I am the first file
        hadoop fs - put /file1.txt /
    - To read the contents of the file
        hadoop fs -cat /file1.txt
    - To check the size of the file
        hadoop fs -count /file1.txt
    - To create a directory
        hadoop fs -mkdir /textfiles
    -To upload a blank file on the fly
        hadoop fs -touchz /my.txt
    -To move a file (sourceāž”destination)
        hadoop fs -mv /lw.txt /textfiles
    - To copy a file (sourceāž”destination)
        hadoop fs -cp /file1.txt /textfiles
    - To remove a file
        hadoop fs -rm  /file1.txt
    - To checkout and explore all the available options
        hadoop fs 

The attached screenshots of the above mentioned commands are : Logo Logo Logo


We can also visualize using GUI Namenode : MasterIP:50070 Datanode : SlaveIP:50075 Logo We can visualize the uploaded files Logo

We see that if the file is small it is broken in only 1 block Logo We can check the size of the name.txt file like:

    -To see the permissions as well as the size of the block in bytes
        ls -l name.txt
    -To see the permissions as well as the size of the block 
        ls -l -h name.txt


The default DFS block size is 32768 , and therefore it is divided into blocks before storing.



Will come up soon, stay tuned :)


These commands are even checked in AWS cloud.


Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

Contribution Guidelines

When contributing to this repository, please first discuss the change you wish to make via issue, email, or any other method with the owners of this repository before making a change.

Contribution Practices

  • Write clear and meaningful commit messages.
  • If you report a bug please provide steps to reproduce the bug.
  • In case of changing the backend routes please submit an updated routes documentation for the same.
  • If there is an UI related change it would be great if you could attach a screenshot with the resultant changes so it is easier to review for the maintainers

Pull Request Process

  1. Ensure any install or build dependencies are removed before the end of the layer when doing a build.
  2. Update the README.md with details of changes to the interface, this includes new environment variables, exposed ports, useful file locations and container parameters.
  3. Only send your pull requests to the development branch where once we reach a stable point it will be merged with the master branch
  4. Associate each Pull Request with the required issue number

Branch Policy

  • development: If you are making a contribution make sure to send your Pull Request to this branch . All developments goes in this branch.
  • master: After significant features/bug-fixes are accumulated in development branch we merge it with the master branch.

Cool Links to Checkout


Distributed under the MIT License. See LICENSE for more information.
