cloudera-spark

Docker image for a Cloudera Quickstart cluster with upgraded (and functional) Spark/Java/Zeppelin

Cloudera QuickStart image is a single-node deployment of Cloudera's 100% open-source distribution including Apache Hadoop, and Cloudera Manager. This is an ideal environment for learning about Spark on Hadoop, trying out new ideas, testing and demoing your Spark application.

This is completely based on the Cloudera's Quickstart for more info go to the Cloudera's dockerhub.

As opposed to the Cloudera Quickstart (CDH 5.7.0), this image has upgraded and preconfigured the following:

Java 1.8.0-opendjk
Python 3.6
Spark 2.4.4
Zeppelin 0.8.1

First you will want to import the image:

docker pull betoca/cloudera-spark

Once downloaded you can run as follows:

docker run --hostname=quickstart.cloudera --privileged=true -ti -p 8080:8080 betoca/cloudera-spark

By default /usr/bin/setup-start.sh is provided as a convenience to start some basic CDH services, then run a Bash shell. This is particularly convenient if you want to leverage HDFS/Hive for your Spark application But also, you can directly run /bin/bash instead if you wish to start services manually.

Note other important ports likely to expose are any external hadoop services depending on the services initialized: Cloudera ports

betoca/docker.cloudera-spark

cloudera-spark