/spark-jupyterlab-docker

Distributed spark cluster with jupyter labs on docker

Primary LanguageJupyter NotebookMIT LicenseMIT

spark-jupyterlab-docker

Distributed spark cluster with jupyter labs on docker.

Components

  • Spark master
  • Spark worker 2x
  • Jupyter Lab

Versions

Component Version
JupyterLab 3.0.9
Spark 2.4.7
Hadoop 2.7
JRE 8

One can change the versions deployed by editing Makefile Spark packages are downloaded from https://archive.apache.org/dist/spark/

Requirements

  • Docker
  • Docker compose (version 3 or newer)

Build

# Builds the necessary stack images
make

Run

# Bootstraps the stack
make run

Stop and cleanup

# Shuts down the stack and removes the containers
make stop

# Make sure the stack is shut down
# Removes all of the stack images
make cleanup -i

Container logs

You can see all component console outputs with docker-compose logs -f -t

Component web user interfaces

Component Web UI URL
JupyterLab http://localhost:8888
JupyterLab spark app* http://localhost:4040
Spark master http://localhost:8080
Spark worker 1 http://localhost:8081
Spark worker 2 http://localhost:8082

*Only applies after spark context has been created

Sample notebook

See sample.ipynb

Todo's

See TODO.md

Known issues

See known_issues.md

Dislaimer

This is a very early version of the contents in the repo

License

MIT