/docker-spark

A Docker image for Apache Spark

Primary LanguageDockerfileMIT LicenseMIT

No longer maintained

docker-spark

Docker Hub Docker Build Status Docker Pulls Docker Stars Build Status

This repository contains Dockerfile of Apache Spark for Docker's automated build published to the public Docker Hub Registry.

Installation

Pull the image from the Docker repository.

docker pull cjonesy/docker-spark:latest

Build

docker build --rm -t cjonesy/docker-spark:latest .

Usage

For a Spark shell inside the container

docker run -it cjonesy/docker-spark:latest spark-shell

For a PySpark shell inside the container

docker run -it cjonesy/docker-spark:latest pyspark

For a Bash shell inside the container

docker run -it cjonesy/docker-spark:latest bash

Configuration

spark-defaults.conf

It is possible to override the following values in spark-defaults.conf from environment variables.

Property Environment Variable Default Value
spark.driver.cores SPARK_DRIVER_CORES 1
spark.driver.maxResultSize SPARK_DRIVER_MAXRESULTSIZE 1g
spark.driver.memory SPARK_DRIVER_MEMORY 5g
spark.exector.memory SPARK_EXECUTOR_MEMORY 5g
spark.local.dir SPARK_LOCAL_DIR /tmp
spark.logConf SPARK_LOGCONF true
spark.master SPARK_MASTER local
spark.driver.supervise SPARK_DRIVER_SUPERVISE false
spark.driver.extraClassPath SPARK_DRIVER_EXTRACLASSPATH /jars/*
spark.executor.extraClassPath SPARK_EXECUTOR_EXTRACLASSPATH /jars/*
spark.python.worker.memory SPARK_PYTHON_WORKER_MEMORY 512m
spark.ui.enabled SPARK_UI_ENABLED true
spark.eventLog.enabled SPARK_EVENTLOG_ENABLED false
spark.eventLog.dir SPARK_EVENTLOG_DIR file:///tmp/spark-events

Example:

docker run -e SPARK_UI_ENABLED=false -it cjonesy/docker-spark spark-shell

How to contribute

Imposter syndrome disclaimer: I want your help. No really, I do.

There might be a little voice inside that tells you you're not ready; that you need to do one more tutorial, or learn another framework, or write a few more blog posts before you can help me with this project.

I assure you, that's not the case.

This project has some clear Contribution Guidelines and expectations that you can read here (CONTRIBUTING).

The contribution guidelines outline the process that you'll need to follow to get a patch merged. By making expectations and process explicit, I hope it will make it easier for you to contribute.

And you don't just have to write code. You can help out by writing documentation, tests, or even by giving feedback about this work. (And yes, that includes giving feedback about the contribution guidelines.)

Thank you for contributing!