/docker-spark

Primary LanguageShellApache License 2.0Apache-2.0

Docker container for spark stand alone cluster

This repository contains a set of scripts and configuration files to run a Apache Spark standalone cluster from Docker container.

To run master execute:

./start-master.sh

To run worker execute:

./start-worker.sh

You can run multiple workers. Every worker would be able to find master by it's container name "spark_master".

To run spark shell against this cluster execute:

./spark-shell.sh

You can run multiple shells. Every shell would be able to find master by it's container name "spark_master".

If you like to run another container against this cluster, please read explanation how to prepare driver container.

If you need to increase memory or core count or pass any other parameter to spark, please use:

./spark-shell.sh --executor-memory 300M --total-executor-cores 3
./start-worker.sh --memory 700M

If you execute these images without scripts mentioned above, please:

  • Remeber to name master container as spark_master for correct working on linkage.
  • Read documentation to understand what's going on.

I also recommend you to use Zeppelin instead of spark shell for working with data. It has pleasant GUI and IPython like functionality. Please use docker container for that.