dockerized spark cluster based on the jupyter/all-spark-notebook image
This set of scripts is a bare-bones example of how to spin up a working spark standalone cluster in docker.
Assuming you have at least 3 ubuntu machines on the same network, seting up a production cluster is as easy as:
- get-docker.sh on each machine
- start-master.sh on your master node
- start-worker.sh spark://[master-ip]:7077 on each worker node
- start-client.sh on the client node. Note: A distributed cluster will not talk to a virtual client
docker-machine create -d virtualbox master
eval $(docker-machine env master)
master_ip=$(docker-machine ip master)
bash start-master.sh $master_ip
docker-machine create -d virtualbox worker
eval $(docker-machine env worker)
bash start-worker.sh "spark://"$master_ip":7077" $(docker-machine ip worker)
docker-machine create -d virtualbox client
eval $(docker-machine env client)
bash start-client.sh $(docker-machine ip client)