The project was featured on an article at Sharek.dev tech blog
This project gives you an Apache Spark cluster in standalone mode with a JupyterLab interface built on top of Docker .
Learn Apache Spark through its Scala and Python (PySpark) by running the Jupyter notebooks with examples on how to read, process and write data.
build.sh
docker-compose up
Application
URL
Description
JupyterLab
localhost:8888
Cluster interface with built-in Jupyter notebooks
Spark Driver
localhost:4040
Spark Driver web ui
Spark Master
localhost:8080
Spark Master node
Spark Worker I
localhost:8081
Spark Worker node with 1 core and 512m of memory (default)
Spark Worker II
localhost:8082
Spark Worker node with 1 core and 512m of memory (default)
Download from Docker Hub (easier)
Download the docker compose file;
curl -LO https://raw.githubusercontent.com/cluster-apps-on-docker/spark-standalone-cluster-on-docker/master/docker-compose.yml
Edit the docker compose file with your favorite tech stack version, check apps supported versions ;
Start the cluster;
Run Apache Spark code using the provided Jupyter notebooks with Scala and PySpark examples;
Stop the cluster by typing ctrl+c
on the terminal;
Run step 3 to restart the cluster.
Build from your local machine
Note : Local build is currently only supported on Linux OS distributions.
Download the source code or clone the repository;
Edit the build.yml file with your favorite tech stack version;
Match those version on the docker compose file;
Build up the images;
chmod +x build.sh ; ./build.sh
Start the cluster;
Run Apache Spark code using the provided Jupyter notebooks with Scala, PySpark and SparkR examples;
Stop the cluster by typing ctrl+c
on the terminal;
Component
Version
Docker Engine
1.13.0+
Docker Compose
1.10.0+
Component
Version
Docker Tag
Apache Spark
3.3.0
<spark-version>
JupyterLab
3.3.0
<jupyterlab-version> -spark-<spark-version>