Table of contents
Apache PySpark
The ubuntu:xenial Docker image with Apache PySpark for the dataops utility.
The korniichuk/pyspark repo.
The korniichuk/pyspark repo.
Start a container with interactive Bash shell:
$ docker run -it korniichuk/pyspark bash
Start a container with interactive PySpark shell:
$ docker run -it korniichuk/pyspark \ /usr/local/src/spark-2.0.1-bin-hadoop2.7/bin/pyspark
Try the following command, which should return 1000:
>>> sc.parallelize(range(1000)).count()
Start a container with interactive Python shell:
$ docker run -it korniichuk/pyspark python >>> from pyspark import SparkConf, SparkContext >>> conf = SparkConf().setMaster("local[*]") >>> sc = SparkContext(conf=conf)
And run the following command, which should also return 1000:
>>> sc.parallelize(range(1000)).count()