/python3.11_pyspark_notebook

Hey there, this is a lightweight docker-image build to run pyspark and jupyter_notebook on port 8000

Primary LanguageDockerfile

PySpark Jupyter Notebook with Docker

This repository provides a minimalistic Docker setup for running a Jupyter Notebook with PySpark 3.5.0 on Python 3.11, using an Alpine-based image.

Features

  • 🐍 Python 3.11 (Alpine-based lightweight image)
  • 🔥 PySpark 3.5.0 (Pre-installed and configured)
  • 📓 Jupyter Notebook (Accessible on port 8000)
  • 📦 Minimal Dependencies (Lightweight and fast)

Getting Started

1. Clone the Repository

git clone https://github.com/yourusername/pyspark-jupyter-docker.git
cd pyspark-jupyter-docker

2. Build and Run the Container

docker-compose up --build -d

3. Access Jupyter Notebook

Open your browser and go to:

http://localhost:8000

4. Stop and Remove the Container

To stop the container, run:

docker-compose down

Folder Structure

.
├── Dockerfile           # Defines the container environment
├── docker-compose.yml   # Manages the service setup
├── workspace/           # Mounted directory for Jupyter notebooks
└── README.md            # Project documentation

Environment Variables

Variable Description
JAVA_HOME Java Home for PySpark
PYSPARK_PYTHON Python executable for PySpark
PYSPARK_DRIVER_PYTHON Jupyter as PySpark driver
PYSPARK_DRIVER_PYTHON_OPTS Runs Jupyter Notebook

Customization

  • To install additional Python packages, update the Dockerfile.
  • To use a different port, modify docker-compose.yml.