/pyspark

The pyspark stack of ready-to-run Apache PySpark in Docker

The UnlicenseUnlicense

Apache PySpark

The ubuntu:xenial Docker image with Apache PySpark for the dataops utility.

The korniichuk/pyspark repo.

The korniichuk/pyspark repo.

Start a container with interactive Bash shell:

$ docker run -it korniichuk/pyspark bash

Start a container with interactive PySpark shell:

$ docker run -it korniichuk/pyspark \
        /usr/local/src/spark-2.0.1-bin-hadoop2.7/bin/pyspark

Try the following command, which should return 1000:

>>> sc.parallelize(range(1000)).count()

Start a container with interactive Python shell:

$ docker run -it korniichuk/pyspark python
>>> from pyspark import SparkConf, SparkContext
>>> conf = SparkConf().setMaster("local[*]")
>>> sc = SparkContext(conf=conf)

And run the following command, which should also return 1000:

>>> sc.parallelize(range(1000)).count()