/hadoop-in-docker

Hadoop in docker cluster, created by docker-compose. Create Hadoop cluster in less than 5mins.

Primary LanguageShellMIT LicenseMIT

hadoop-in-docker

Total distributed hadoop in docker cluster built in less than 5 minutes, created by docker-compose. Now it can support HDFS file system and MapReduce, and I will add Hbase ,Zookeeper and other hadoop family members to the docker cluster. Example Install Video

If you have any suggestion or question, feel free to contact me, and welcome to issues. 😃

Environments:

  • docker
  • docker-compose

Packages:

  • hadoop-3.3.1

    You can install hadoop-3.3.1.tar.gz from hadoop list and choose other version. But if you have no interest in change the source file or DIY ,you'd better install the version of 3.3.1. I'll fix up the repo to support many versions with little configuration later.

Other packages updating...

Build Base Image

  • Put your hadoop package under the /base path, then build the base image locally.

    cd ./base
    make all

After a few minutes, the base image(smm/hadoop.base:3.3.1) will be build successfully.

Create Share Volume Mount Path(This Step Can be Skipped Cause it can be created automatically)

  • I have mounted two path to share ssh_key and hadoop etc files among the docker cluster simply. These files must be the same among different docker containers.

    # under /hadoop-in-docker/
    mkdir .secret
    mkdir share

Build&Start hadoop docker cluster

  • docker-compose

    docker-compose up -f s2-docker-compose.yml
    # docker-compose up -f s2-docker-compose.yml -d

    if you want to run it in the background, you can add -d param.

    If everything is OK, you can see Hdfs done... at end. If without -d, the process may wait for a little long time at permission set(About half minute or longer). With -d , it will cost just a few seconds.

    You can check hadoop processes in the master node by:

    docker exec -it hadoop-master su # enter the container as root
    su hdfs # change user to hdfs
    jps # look at hdfs java process
    # If everything is OK , jps result:(pid may be different)
    577 DataNode
    1651 JobHistoryServer
    1091 ResourceManager
    2454 Jps
    795 SecondaryNameNode
    1230 NodeManager
    431 NameNode

Destroy the cluster and images

  • run the destroy shell
    chmod +x ./destroy.sh
    ./destroy.sh

Other test commands

  • If you want to enter single node, you can :
    docker exec -it hadoop-master su # enter the container as root
    su hdfs # change user to hdfs
    jps # look at hdfs java process
    # Use other hdfs commands to test
      hdfs dfs -mkdir /user
      hdfs dfs -mkdir /user/catcher/
      hdfs dfs -mkdir /input
      hdfs dfs -put $HADOOP_HOME/etc/hadoop/*.xml /input
      hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.1.jar grep /input /output 'dfs[a-z.]+'
      hdfs dfs -get /output ./output
      cat ./output/*

Extension & Wait to be updated

  • Change slave nodes nums:

    In theory, you can DIY as many slaves&masters you want.

    • Create another docker-compose.yml file.
    • change the /hadoop-master/docker-entrypoint.sh and the /hadoop-slave/docker-entrypoint.sh ,find dfs.replication change the value to node nums you want.
    • change the _set_workers() function in the 2 files above, write in all the hostnames.
  • Add other hadoop family members:

    Waiting to be done...

Common Questions & Solutions

  1. Build base image error ,cause by centos8.repo source

    Due to centos org stop to support centos8 , I use aliyun mirror instead. You can find other solutions. If still have other strange error, you can change the image source form centos8 to other linux system image. Just change the yum to others(like apt for Ubuntu).

  2. Hadoop problems caused by JAVA_HOME

    At docker-entrpoint.sh set_java_hadoop_path() function:

    export JAVA_HOME=\/usr\/lib\/jvm\/java-1.8.0-openjdk-1.8.0.312.b07-2.el8_5.x86_64

    Here, the path depends on your system and downloaded java version. If you don't know the correct path, you can create a test container form smm/hadoop.base:3.3.1 , enter the container and find the path.