Hadoop Cluster
A naive Hadoop cluster consists of multiple Docker containers (1 master and 2 workers).
- Deploy a Hadoop cluster on a single machine with multiple Docker containers.
- Deploy a fully distributed Hadoop cluster on multiple host machines by connecting the standalone containers with Docker swarm overlay network.
- Using
docker stack
anddocker-compose
to achieve swift & flexible deployment of Hadoop cluster as services on Docker swarm.
Deploy & Test Hadoop Cluster Locally
- Build docker image.
./docker-build-image.sh
- Create a Hadoop network.
docker network create --driver=bridge hadoop
- Start containers.
./docker-start-container.sh
- Start NameNode daemon, DataNode daemon, ResourceManager daemon and NodeManager daemon in
hadoop-master
container.
./start-hadoop.sh
- Run WordCount example in
hadoop-master
container.
./run-wordcount.sh
Deploy Hadoop Cluster on Multiple Host Machines
- On
manager
, initialize the swarm.
docker swarm init --advertise-addr=<IP-ADDRESS-OF-MANAGER>
- On
worker-n
, join the swarm. If the host only has one network interface, the --advertise-addr flag is optional.
docker swarm --join --token <TOKEN> \
--advertise-addr <IP-ADDRESS-OF-WORKER-N> \
<IP-ADDRESS-OF-MANAGER>:2377
- On
manager
, create an attachable overlay network calledhadoop-net
.
docker network create --driver=overlay --attachable hadoop-net
- On
manager
, start an interactive (-it) containerhadoop-master
that connects tohadoop-net
.
docker run -it \
--name hadoop-master \
--hostname hadoop-master \
--network hadoop-net \
zzhou612/hadoop-cluster:latest
- On
worker-n
, start a detached (-d) and interactive (-it) containerhadoop-worker-n
that connects tohadoop-net
.
docker run -dit \
--name hadoop-worker-n \
--hostname hadoop-worker-n \
--network hadoop-net \
zzhou612/hadoop-cluster:latest
- On
manager
, start Hadoop and run the WordCount example in containerhadoop-master
.
./docker-start-container.sh
./run-wordcount.sh