Minimalistic data engineer's stack for local Spark development. Inspired by this project.
Goals:
- Minimalistic. One small image containing different workloads.
- Easy to use. Just run
docker-compose up
. - Supports external IDEs (JetBrains, VSCode, etc.)
- Easy to upgrade.
Includes:
- Across all images:
- git.
- Apache Spark
- master
- worker #1
- history server
- thrift server. Connect using
jdbc:hive2://localhost:10000
(instructions). - Persistent metastore (Hive).
- Jupyter Lab with extensions:
- jupyterlab-git
- jupyter-resource-usage
Todo:
- Hosted vs code