/docker-spark-standalone

Spark Standalone (on Docker)

Primary LanguageShell

#Spark (Standalone)

##Start a standalone cluster

###Master

docker run --name spark-master -d --net=host --restart=always wongnai/spark-standalone master

Try to check if the master is running by going to WEBUI (port 8080) on the running machine. We need the url of spark master to start the workers.

###Worker1

Get the url for the master (e.g. spark://192.168.1.50:7077) then start a worker with default port (7078 and 8081 for WebUI)

docker run --name spark-worker1 -d --net=host --restart=always wongnai/spark-standalone worker spark://${MASTER_HOST_OR_IP}:7077

###Worker2

Start anohter worker with different ports by setting environment variables.

docker run --name spark-worker2 -d --net=host --restart=always -e SPARK_WORKER_PORT=7079 -e SPARK_WORKER_WEBUI_PORT=8082 wongnai/spark-standalone worker spark://${MASTER_HOST_OR_IP}:7077

###Running an SparkPi example

docker exec -it spark-master /opt/spark/bin/run-example SparkPi 10

You should be able to see lot of logs with "Pi is roughly 3.142448".

###Submit an SparkPi example from any node

docker exec -it spark-worker /opt/spark/bin/spark-submit --master spark://${MASTER_HOST_OR_IP}:7077 /opt/spark/examples/src/main/python/pi.py 10

##Environment Variables

Spark reads environment variables in start script so we can adjust the variables to change ip/ports. Please see http://spark.apache.org/docs/latest/spark-standalone.html#cluster-launch-scripts for the available variables.

Basically, this docker image set default values to the following variables:-

SPARK_MASTER_PORT=7077
SPARK_MASTER_WEBUI_PORT=8080
SPARK_WORKER_PORT=7078
SPARK_WORKER_WEBUI_PORT=8081