My take on running Hadoop's HDFS, Zookeeper, Kafka and Accumulo on Docker (see below for alternatives). The setup allows to locally start-up a multi-node cluster on the fly. Porting it to a real cluster is pretty much straightforward (we did it :)). This might not be the most elegant approaches and its not 100% refined but it serves as a working example. Also, note that this config uses defaults where ever possible, so hardening the setup for production is still required. Overall, this setup should still be a good starting point to get the above mentioned stack up and running via Docker.
# setup Docker swarm (if not already done so)
docker swarm init --advertise-addr=<your ip>
# prepare
git clone https://github.com/mgbckr/docker-cloud-stack.git
cd docker-cloud-stack
bash build-docker.sh
bash prepare_local.sh reset
# start stack
docker stack deploy -c docker-compose.yml cloud-stack
# connect to master
docker ps
docker exec -it <node-master-1 NAME> bash
# on node-master-1 (on the first run, you will need to type in a new password for Accumulo's root user)
bash manage.sh start_all
You can now attach containers to the cloudnet
network and connect to the different services.
# connect to master
docker ps
docker exec -it <node-master-1 NAME> bash
# on node-master-1
bash manage.sh stop_all
exit
# on docker host
docker stack rm cloud-stack
Note that you can restart the cluster and all your data will be kept if nothing went wrong.
The only thing you really have to do to run this on a actual docker swarm is setting the correct placement constraints for each node and adjusting the volume mappings.
For adding a worker node node-worker-4
encompasses:
- append
node-worker-4
to the fileall_nodes
- Hadoop:
- append
node-worker-4
tonode-master-1/app/hadoop/etc/hadoop/workers
- append
- Zookeeper:
- append
server.4=node-worker-4:2889:3889
tonode-worker/app/zookeeper/conf/zoo.cfg
- add
node-worker-4/data/zookeeper/myid
with content4
- append
- Kafka:
- copy
node-worker-1/app/kafka/config/server.properties
tonode-worker-4/app/kafka/config/server.properties
and changebroker.id=1
tobroker.id=4
- in ALL
node-worker-X/app/kafka/config/server.properties
addnode-worker-4:2181
to the comma separated listzookeeper.connect=
- copy
- Accumulo:
- in
node/app/accumulo/conf/accumulo-site.xml
addnode-worker-4:2181
toinstance.zookeeper.host
- in
node-master/app/accumulo/conf/slaves
appendnode-worker-4
- in
Configuring the cluster happens through the master node by distributing the files in the config folder across all nodes based on their host names. This approach allows to easily integrate new services with custom configs and also enables overwriting binary files or adding custom libraries without redeploying the Docker stack.
The configuration approach is based on a hierarchical overlay system using host names (uhhh, sounds so fancy :)).
In other words, the configs are put together from a set of folders which are prefixes of those host names.
That is, all folders which are a prefix (defined by a dash -
) are merged together whereas files from longer folder names overwrite those from shorter folder names.
Additionally, the files from the _default
folder are used for initialization.
Let us consider "node-worker-1" for which we build the final config in some temporary folder called _tmp
.
Initialization
- copy files from
_defaults/node
to_tmp
- copy files from
_defaults/node-worker
to_tmp
overwriting existing files - copy files from
_defaults/node-worker-1
to_tmp
overwriting existing files
Config
- copy files from
node
to_tmp
overwriting existing files - copy files from
node-worker
to_tmp
overwriting existing files - copy files from
node-worker-1
to_tmp
overwriting existing files
Thus, files from _defaults
are used as an initialization whereas files from longer folder names overwrite files from shorter folder names. Afterwards the actual config files (not in _defaults
) overwrite these files using the same procedure.
I built this on
Docker version 18.06.1-ce, build e68fc7a
The current versions of the applications are:
hadoop_version=3.1.1
zookeeper_version=3.4.13
kafka_version=2.0.0
accumulo_version=1.9.2
You can change the application versions in the assets/build/Dockerfile.build.sh
. Note, however, that this may require changing the configs corresponding to the specific versions as well as may updating the download script assets/build/Dockerfile.build.sh
.
Some TODOs which I would really like someone else to do :D
- adjust swapiness to 0 for Accumulo (keeps complaining on startup)
- clean up, document, and unify scripts
- expose some ports to the world (e.g., 50070 to check the status of the HDFS)
- optimize CLI of
manage.sh
- replace timeouts with actual checks in start up sequence (
manage.sh
) - optimize
run.sh
for graceful shutdown of the cluster whendocker stack rm
is called (even though this might not be possible because currently the first thing Docker does when that command is issued is detaching the network ... grr) - Also see further notes in
docker-compose.yml
- Maybe allow to run each service on it's own container, e.g., for better resource management?
- clean and optimize configs?
- Add the possibility for templates in order to easy the burden on adding worker nodes, e.g., using
envsubst
. Right now, for example, themyid
files for Zookeeper nodes or even worse theserver.properties
files for Kafka brokers are not nice to manage. We currently did not do this for stability reasons since the corresponding scripts will be more complicated and thus harder to test. - Add the concept of roles. Currently we have a strict hierarchy based on host names. Maybe, just maybe, it may make more sense to introduce roles in order to encapsulate configs by role instead of by host names for more flexibility.
- If we continue this we will get something like Ambari ;)
- Add the possibility for templates in order to easy the burden on adding worker nodes, e.g., using
Here are some random (probably too short) notes for later reference. They may explain some choices I made for this project. Note that I am new to Docker! Thus, please feel free to discuss and provide comments and suggestions via the issue tracker!
I opted to configure and start the cluster from the master node manually rather than using any Docker based configuration or automatic start-up. Some justifications:
- Using Dockers config functionality would have blown up the
docker-compose.yml
file like crazy: When changing a config file, either all config files would change, causing a reboot for all nodes, which is not desirable in a production scenario. To solve this it would have been required to add a massive amount of config entries with different environment variables for each file and each node (). - Using volumes, would also blow up the the
docker-compose.yml
and I would have needed a NFS or something the like to mount into each container. - Similar things are true for coming up with an (arguable pretty nice) system similar to big-data-europe/docker-hadoop which allows to configure everything via environment variables. I like it, but considering all the different services with their own configuration variants, the current approach seemed to be the most simple while being perfectly extensible and versatile (see, e.g., the cababilities to also overwrite binary files:
assets/manage/config/node-worker/services/kafka/bin/kafka-server-stop.sh
). - Finally I really wanted the cluster to start and stop automatically with
docker stack deploy
anddocker stack rm
. However, shutting down automatically in a safe way is not possible since the first thing Docker does whendocker stack rm
is issued seems to be detaching the network. Thus a organized shutdown across several nodes is not possible. Consequently, in order to not imply that the cluster will automatically shut down, I opted for a manual startup as well.
At first I wanted to use Docker's replication to manage worker nodes dynamically. Turns out this does not work because the hostnames will not be fixed which is required for the services I run.
This drove me crazy! So it seems that when using virtual IPs (VIPs), the host that is contacted always thinks it is contacted by the gateway responsible for handling those VIPs (mostly the IP ending in .2) instead of the actual host. That's why my nodes use dnsrr
. I figure that this HAS to be a bug.
Among others, I got inspired by these projects: