Hadoop Cluster

A naive Hadoop cluster consists of multiple Docker containers (1 master and 2 workers).

  • Deploy a Hadoop cluster on a single machine with multiple Docker containers.
  • Deploy a fully distributed Hadoop cluster on multiple host machines by connecting the standalone containers with Docker swarm overlay network.
  • Using docker stack and docker-compose to achieve swift & flexible deployment of Hadoop cluster as services on Docker swarm.

Deploy & Test Hadoop Cluster Locally

  1. Build docker image.
./docker-build-image.sh
  1. Create a Hadoop network.
docker network create --driver=bridge hadoop
  1. Start containers.
./docker-start-container.sh
  1. Start NameNode daemon, DataNode daemon, ResourceManager daemon and NodeManager daemon in hadoop-master container.
./start-hadoop.sh
  1. Run WordCount example in hadoop-master container.
./run-wordcount.sh

Deploy Hadoop Cluster on Multiple Host Machines

  1. On manager, initialize the swarm.
docker swarm init --advertise-addr=<IP-ADDRESS-OF-MANAGER>
  1. On worker-n, join the swarm. If the host only has one network interface, the --advertise-addr flag is optional.
docker swarm --join --token <TOKEN> \
  --advertise-addr <IP-ADDRESS-OF-WORKER-N> \
  <IP-ADDRESS-OF-MANAGER>:2377
  1. On manager, create an attachable overlay network called hadoop-net.
docker network create --driver=overlay --attachable hadoop-net
  1. On manager, start an interactive (-it) container hadoop-master that connects to hadoop-net.
docker run -it \
  --name hadoop-master \
  --hostname hadoop-master \
  --network hadoop-net \
  zzhou612/hadoop-cluster:latest
  1. On worker-n, start a detached (-d) and interactive (-it) container hadoop-worker-n that connects to hadoop-net.
docker run -dit \
  --name hadoop-worker-n \
  --hostname hadoop-worker-n \
  --network hadoop-net \
  zzhou612/hadoop-cluster:latest
  1. On manager, start Hadoop and run the WordCount example in container hadoop-master.
./docker-start-container.sh
./run-wordcount.sh