
Spark vs. MongoDB Atlas

Primary LanguageJupyter Notebook

PySpark + MongoDB + SingleStore

Use Docker Compose to start the setup

docker compose up

This will start a setup of


Open JupyterLab here or connect to the Jupyter server at and use the following token:


Use the Aggegation Pipelines notebook as a starting point.

About the Dockerfile

The Dockerfile (as used in docker-compose.yml) provides three different Docker targets, namely master, worker and jupyter. All three targets share the same base images consisting of:

Using the same base image for Jupyter Lab and Spark was the only way to get this setup working; specifically, having only master and worker images and a predefined PySpark image would consistently fail with either JARs not being found or serialization issues happening when running PySpark programs.