High Availability Hadoop in Docker with JournalNode Quorum and Zookeeper, with Kubernetes in mind. Based on the official guide Hadoop HA with JournalNode Quorum.
- Java 8
- Hadoop 2.7.3
For the sake of simplicity we will use 1 Zookeeper, 1 JournalNode, 2 NameNodes and 1 DataNode to be able to run on a single Docker host. It goes without saying that you should adjust these numbers in production.
-
Create a common docker network
docker network create hadoop
-
Start Zookeeper
docker run --net=hadoop --name zk-1 --restart always -d zookeeper
-
Start JournalNode
docker run -d --name=jn-1 -e "NNODE1_IP=nn1" -e "NNODE2_IP=nn2" -e "JN_IPS=jn-1:8485" -e "ZK_IPS=zk-1:2181" --net=hadoop -v /tmp/hadoop-jn:/mnt/hadoop bastipaeltz/hadoop-ha-docker /etc/bootstrap.sh -d journalnode
-
Format the active NameNode
docker run --hostname=nn1 --name=nn1 -it -e "NNODE1_IP=nn1" -e "NNODE2_IP=nn2" -e "JN_IPS=jn-1:8485" -e "ZK_IPS=zk-1:2181" --net=hadoop -v /tmp/hadoop-nn1:/mnt/hadoop bastipaeltz/hadoop-ha-docker /etc/bootstrap.sh -d format
-
Sync the initial state to the standby NameNode
docker run --hostname=nn2 --name=nn2 -it -e "NNODE1_IP=nn1" -e "NNODE2_IP=nn2" -e "JN_IPS=jn-1:8485" -e "ZK_IPS=zk-1:2181" --net=hadoop -v /tmp/hadoop-nn2:/mnt/hadoop -v /tmp/hadoop-nn1:/mnt/shared/nn1 bastipaeltz/hadoop-ha-docker /etc/bootstrap.sh -d standby
Notice that the volume from nn1 - which now holds the initial cluster state - is just mounted to a certain directory where all data will be copied to nn2's volume.
At this point both volumes hold the initial cluster state and can be used as a mountpoint in actual NameNode images.
- Start both NameNodes (separate terminals)
docker run --hostname=nn1 -p 50060:50070 --name=nn1 -it -e "NNODE1_IP=nn1" -e "NNODE2_IP=nn2" -e "JN_IPS=jn-1:8485" -e "ZK_IPS=zk-1:2181" --net=hadoop -v /tmp/hadoop-nn1:/mnt/hadoop bastipaeltz/hadoop-ha-docker /etc/bootstrap.sh -d namenode
docker run --hostname=nn2 --name=nn2 -p 50070:50070 -it -e "NNODE1_IP=nn1" -e "NNODE2_IP=nn2" -e "JN_IPS=jn-1:8485" -e "ZK_IPS=zk-1:2181" --net=hadoop -v /tmp/hadoop-nn2:/mnt/hadoop bastipaeltz/hadoop /etc/bootstrap.sh -d namenode
Now both NameNodes should be running, check it by visiting the WebUI on Port 50060 (nn1) and 50070 (nn2). nn2 should be standby
while nn1 is active
.
- Start DataNodes
docker run -d -e "NNODE1_IP=nn1" -e "NNODE2_IP=nn2" -e "JN_IPS=jn-1:8485" -e "ZK_IPS=zk-1:2181" --net=hadoop -v /tmp/hadoop-dn-1:/mnt/hadoop bastipaeltz/hadoop-ha-docker /etc/bootstrap.sh -d datanode
- Kill the active NameNode to trigger failover
Just press CTRL-C on the terminal which is attached to the active NameNode. Now watch on the WebUI how the standby NameNode gets active.
DataNodes are still connected. Wait a bit and restart the formerly active NameNode. Now it will be the standby Node.
Kubernetes is generally not that good in doing one thing before another one (steps 4 and 5). Steps 4 and 5 can be accomplished by using InitContainers to pass around the volumes similar how it is done in this example.
-d
- Runs the service continuously instead of auto quiting-b
- Opens a bash terminal after starting to inspect its internal file system.
By default the cluster will start with the name "cluster". You can set this name with $CLUSTER_NAME
For the containers to run you need to set 3 environment variable on the docker run command.
NNODE1_IP
: Address to NameNode 1 without portNNODE2_IP
: Address to NameNode 2 without portJN_IPS
: Comma separated addresses for JournalNodes with port 8485. (At least 3 or more as long as it is an uneven number.)ZK_IPS
: Comma separated addresses for Zookeeper nodes with port (default is 2181). (At least 3 or more as long as it is an uneven number.)
/mnt/hadoop/dfs/name
- For NameNode storage/mnt/hadoop/dfs/data
- For DataNode storage/mnt/hadoop/journal/data
- For Journal storage/usr/local/hadoop/logs/
- For the logs. You can also replace the/usr/local/hadoop/etc/log4j.properties
with an attach docker volume to that file to customize the logging settings
It is important that the IPs of the namenode are the IP address/DNS name of the containers because Hadoop actually binds to those addresses.
In certain situations the NameNodes need to fence for a proper failover. Now the Fence will always return true without doing anything. Replace ```/etc/fence.sh`` with a docker volume attach for your own fencing algorithm. Probably something like a call to your docker scheduler to close down the other NameNode.