Total distributed hadoop in docker cluster built in less than 5 minutes, created by docker-compose. Now it can support HDFS file system and MapReduce, and I will add Hbase ,Zookeeper and other hadoop family members to the docker cluster. Example Install Video
If you have any suggestion or question, feel free to contact me, and welcome to issues. 😃
Environments:
- docker
- docker-compose
Packages:
-
hadoop-3.3.1
You can install hadoop-3.3.1.tar.gz from hadoop list and choose other version. But if you have no interest in change the source file or DIY ,you'd better install the version of 3.3.1. I'll fix up the repo to support many versions with little configuration later.
Other packages updating...
-
Put your hadoop package under the /base path, then build the base image locally.
cd ./base make all
After a few minutes, the base image(smm/hadoop.base:3.3.1) will be build successfully.
-
I have mounted two path to share ssh_key and hadoop etc files among the docker cluster simply. These files must be the same among different docker containers.
# under /hadoop-in-docker/ mkdir .secret mkdir share
-
docker-compose
docker-compose up -f s2-docker-compose.yml # docker-compose up -f s2-docker-compose.yml -d
if you want to run it in the background, you can add
-d
param.If everything is OK, you can see
Hdfs done...
at end. If without-d
, the process may wait for a little long time atpermission set
(About half minute or longer). With-d
, it will cost just a few seconds.You can check hadoop processes in the master node by:
docker exec -it hadoop-master su # enter the container as root su hdfs # change user to hdfs jps # look at hdfs java process # If everything is OK , jps result:(pid may be different) 577 DataNode 1651 JobHistoryServer 1091 ResourceManager 2454 Jps 795 SecondaryNameNode 1230 NodeManager 431 NameNode
- run the destroy shell
chmod +x ./destroy.sh ./destroy.sh
- If you want to enter single node, you can :
docker exec -it hadoop-master su # enter the container as root su hdfs # change user to hdfs jps # look at hdfs java process # Use other hdfs commands to test hdfs dfs -mkdir /user hdfs dfs -mkdir /user/catcher/ hdfs dfs -mkdir /input hdfs dfs -put $HADOOP_HOME/etc/hadoop/*.xml /input hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.1.jar grep /input /output 'dfs[a-z.]+' hdfs dfs -get /output ./output cat ./output/*
-
Change slave nodes nums:
In theory, you can DIY as many slaves&masters you want.
- Create another docker-compose.yml file.
- change the
/hadoop-master/docker-entrypoint.sh
and the/hadoop-slave/docker-entrypoint.sh
,finddfs.replication
change the value to node nums you want. - change the
_set_workers()
function in the 2 files above, write in all the hostnames.
-
Add other hadoop family members:
Waiting to be done...
-
Build base image error ,cause by centos8.repo source
Due to centos org stop to support centos8 , I use aliyun mirror instead. You can find other solutions. If still have other strange error, you can change the image source form centos8 to other linux system image. Just change the
yum
to others(like apt for Ubuntu). -
Hadoop problems caused by JAVA_HOME
At
docker-entrpoint.sh
set_java_hadoop_path()
function:export JAVA_HOME=\/usr\/lib\/jvm\/java-1.8.0-openjdk-1.8.0.312.b07-2.el8_5.x86_64
Here, the path depends on your system and downloaded java version. If you don't know the correct path, you can create a test container form
smm/hadoop.base:3.3.1
, enter the container and find the path.