This repository provides a minimalistic Docker setup for running a Jupyter Notebook with PySpark 3.5.0 on Python 3.11, using an Alpine-based image.
- 🐍 Python 3.11 (Alpine-based lightweight image)
- 🔥 PySpark 3.5.0 (Pre-installed and configured)
- 📓 Jupyter Notebook (Accessible on port 8000)
- 📦 Minimal Dependencies (Lightweight and fast)
git clone https://github.com/yourusername/pyspark-jupyter-docker.git
cd pyspark-jupyter-docker
docker-compose up --build -d
Open your browser and go to:
http://localhost:8000
To stop the container, run:
docker-compose down
.
├── Dockerfile # Defines the container environment
├── docker-compose.yml # Manages the service setup
├── workspace/ # Mounted directory for Jupyter notebooks
└── README.md # Project documentation
Variable | Description |
---|---|
JAVA_HOME |
Java Home for PySpark |
PYSPARK_PYTHON |
Python executable for PySpark |
PYSPARK_DRIVER_PYTHON |
Jupyter as PySpark driver |
PYSPARK_DRIVER_PYTHON_OPTS |
Runs Jupyter Notebook |
- To install additional Python packages, update the
Dockerfile
. - To use a different port, modify
docker-compose.yml
.