Hadoop Cluster

A Hadoop cluster consists of multiple (N) Docker containers (1 master and N-1 workers).

  • Deploy a Hadoop cluster on a single machine with multiple Docker containers.
  • Deploy a fully distributed Hadoop cluster on multiple host machines by connecting the standalone containers with Docker swarm overlay network.
  • Support common libraries for big data

Deploy & Test Hadoop Cluster Locally

  1. Build docker image.
mkdir -p .config && cp ~/.ssh/{id_rsa,id_rsa.pub} .config
./download.sh
./build/docker-build-image.sh
  1. Init docker and create a Hadoop network.
docker system prune
master/init_swarm.sh # does not work on manjaro
master/init_network.sh
  1. Start containers.
./standalone.sh
  1. Run WordCount example in hadoop-master container.
./run-wordcount.sh

Deploy Hadoop Cluster on Multiple Host Machines

  1. Build docker image.
./download.sh
./docker-build-image.sh
  1. On manager, initialize the swarm.
docker system prune
master/init_swarm.sh
  1. On worker-n, join the swarm. If the host only has one network interface, the --advertise-addr flag is optional.
docker system prune
worker/join_swarm.sh <TOKEN> <IP-ADDRESS-OF-MANAGER>
  1. On manager, create an attachable overlay network called hadoop-net.
msater/init_network.sh
  1. On manager, start an interactive (-it) container hadoop-master that connects to hadoop-net.
export WORKER_NUMBER=N
master/start.sh
  1. On worker-n, start a detached (-d) and interactive (-it) container hadoop-worker-n that connects to hadoop-net.
export WORKER_NUMBER=N
export WORKER_ID=X
worker/start.sh
  1. Start NameNode daemon, DataNode daemon, ResourceManager daemon and NodeManager daemon in hadoop-master container.
master/start_hadoop.sh
  1. Run WordCount example in hadoop-master container.
./run-wordcount.sh