Distributed spark cluster with jupyter labs on docker.
- Spark master
- Spark worker 2x
- Jupyter Lab
Component | Version |
---|---|
JupyterLab | 3.0.9 |
Spark | 2.4.7 |
Hadoop | 2.7 |
JRE | 8 |
One can change the versions deployed by editing Makefile Spark packages are downloaded from https://archive.apache.org/dist/spark/
- Docker
- Docker compose (version 3 or newer)
# Builds the necessary stack images
make
# Bootstraps the stack
make run
# Shuts down the stack and removes the containers
make stop
# Make sure the stack is shut down
# Removes all of the stack images
make cleanup -i
You can see all component console outputs with docker-compose logs -f -t
Component | Web UI URL |
---|---|
JupyterLab | http://localhost:8888 |
JupyterLab spark app* | http://localhost:4040 |
Spark master | http://localhost:8080 |
Spark worker 1 | http://localhost:8081 |
Spark worker 2 | http://localhost:8082 |
*Only applies after spark context has been created
See sample.ipynb
See TODO.md
See known_issues.md
This is a very early version of the contents in the repo