aloneguid/data-eng-stack

Data Engineering Local Stack

DockerfileApache-2.0

Data Engineering Stack

Minimalistic data engineer's stack for local Spark development. Inspired by this project.

Goals:

Minimalistic. One small image containing different workloads.
Easy to use. Just run docker-compose up.
Supports external IDEs (JetBrains, VSCode, etc.)
Easy to upgrade.

Includes:

Across all images:
- git.
Apache Spark
- master
- worker #1
- history server
- thrift server. Connect using jdbc:hive2://localhost:10000 (instructions).
- Persistent metastore (Hive).
Jupyter Lab with extensions:
- jupyterlab-git
- jupyter-resource-usage

Todo:

Hosted vs code