
Run Hadoop Custer within Docker Containers

Primary LanguageShellApache License 2.0Apache-2.0

Run Hadoop Cluster within Docker Containers

alt tag


Need docker be installed and configured to use Device Mapper diver first


Device Mapper driver is used to limit each container storge size, if you just want to try this repo, you can comment these lines in start-container.sh:

--storage-opt size=10G

5 Nodes Hadoop Cluster

1. pull docker image
sudo docker pull wentaoma/hadoop:2.7.2
2. clone github repository
git clone https://github.com/WalterMa/hadoop-cluster-docker.git
3. create hadoop network
sudo docker network create --driver=bridge hadoop
4. start container
cd hadoop-cluster-docker
sudo ./start-container.sh


start hadoop-master container...
start hadoop-slave1 container...
start hadoop-slave2 container...
start hadoop-slave3 container...
start hadoop-slave4 container...
  • start 3 containers with 1 master and 4 slaves
  • you will get into the /root directory of hadoop-master container
5. start hadoop
  • start hadoop services include httpfs
6. run wordcount


input file1.txt:
Hello Hadoop

input file2.txt:
Hello Docker

wordcount output:
Docker    1
Hadoop    1
Hello    2

Arbitrary size Hadoop cluster

1. pull docker images and clone github repository

do 1~3 like section A

2. rebuild docker image
sudo ./resize-cluster.sh 3
  • specify parameter > 1: 2, 3..
  • this script just rebuild hadoop image with different slaves file, which pecifies the name of all slave nodes
3. start container
sudo ./start-container.sh 3
  • use the same parameter as the step 2
4. run hadoop cluster

do 5~6 like section A

How to connect to remote hadoop and commit job in Intellij ?

Please see this repo: https://github.com/WalterMa/hadoop-intellj-example