A Hadoop cluster consists of multiple (N) Docker containers (1 master and N-1 workers).
- Deploy a Hadoop cluster on a single machine with multiple Docker containers.
- Deploy a fully distributed Hadoop cluster on multiple host machines by connecting the standalone containers with Docker swarm overlay network.
- Support common libraries for big data
- Build docker image.
mkdir -p .config && cp ~/.ssh/{id_rsa,id_rsa.pub} .config
./download.sh
./build/docker-build-image.sh
- Init docker and create a Hadoop network.
docker system prune
master/init_swarm.sh # does not work on manjaro
master/init_network.sh
- Start containers.
./standalone.sh
- Run WordCount example in
hadoop-master
container.
./run-wordcount.sh
- Build docker image.
./download.sh
./docker-build-image.sh
- On
manager
, initialize the swarm.
docker system prune
master/init_swarm.sh
- On
worker-n
, join the swarm. If the host only has one network interface, the --advertise-addr flag is optional.
docker system prune
worker/join_swarm.sh <TOKEN> <IP-ADDRESS-OF-MANAGER>
- On
manager
, create an attachable overlay network calledhadoop-net
.
msater/init_network.sh
- On
manager
, start an interactive (-it) containerhadoop-master
that connects tohadoop-net
.
export WORKER_NUMBER=N
master/start.sh
- On
worker-n
, start a detached (-d) and interactive (-it) containerhadoop-worker-n
that connects tohadoop-net
.
export WORKER_NUMBER=N
export WORKER_ID=X
worker/start.sh
- Start NameNode daemon, DataNode daemon, ResourceManager daemon and NodeManager daemon in
hadoop-master
container.
master/start_hadoop.sh
- Run WordCount example in
hadoop-master
container.
./run-wordcount.sh