Image based on a gettyimages Spark image(https://github.com/gettyimages/docker-spark/tree/2.1.0-hadoop-2.7), with Airflow installed additionally. This can be useful for testing Airflow Spark workflows in a local environment.
spark
An ubuntu:16.04
based Spark container. Use it in a standalone cluster with the accompanying docker-compose.yml
, or as a base for more complex recipes.
docker example
To run SparkPi
, run the image with Docker:
docker run --rm -it -p 4040:4040 gettyimages/spark bin/run-example SparkPi 10
To start spark-shell
with your AWS credentials:
docker run --rm -it -e "AWS_ACCESS_KEY_ID=YOURKEY" -e "AWS_SECRET_ACCESS_KEY=YOURSECRET" -p 4040:4040 gettyimages/spark bin/spark-shell
To do a thing with Pyspark
echo "import pyspark\nprint(pyspark.SparkContext().parallelize(range(0, 10)).count())" > count.py
docker run --rm -it -p 4040:4040 -v $(pwd)/count.py:/count.py gettyimages/spark bin/spark-submit /count.py
docker-compose example
To create a simplistic standalone cluster with docker-compose:
docker-compose up
The SparkUI will be running at http://${YOUR_DOCKER_HOST}:8080
with one worker listed. To run pyspark
, exec into a container:
docker exec -it dockerspark_master_1 /bin/bash
bin/pyspark
To run SparkPi
, exec into a container:
docker exec -it dockerspark_master_1 /bin/bash
bin/run-example SparkPi 10
license
MIT