MrPowers/mack

Docker image for contributors

Opened this issue · 11 comments

A ready-to-use docker image for project contributors as alternative to (local) Python environment via Poetry. The image probably would need to include things like

  • base image (e.g. ubuntu)
  • (py)spark
  • delta
  • python libs
  • environment vars
  • etc.

I just tried to find examples for other oss repos but didn't find any in the short research time. So maybe this is not useful to most contributors or there are other reasons not to have it. Nothing would come to my mind atm.

@Triamus - thanks for adding this.

Anyone in the community can feel free to grab this.

@MrPowers I have built this kind of docker images in the past. Mind if I take a stab at this?

@souvik-databricks you probably know this but a few things I already researched.

I would think that ideally any image is building on top of those efforts but I don't know the timeline. In Jira they speak of Spark 3.4.

@souvik-databricks I'd be happy to test things out if needed.

@souvik-databricks - yea, sure, go for it!

I think there are some Delta Lake docker images around. Let me take a look.

Actually, looks like @Triamus has already provided the link, here it is: GitHub/delta-io/delta-docs: quickstart_docker

@MrPowers I have a local branch ready to go for Docker and docker-compose support for mack if you want it. Runs the unit tests inside the container and also has instructions for dropping into the container for development as well. Container has Spark (spark-3.3.2), Delta (delta-core_2.12:2.2.0), etc, everything needed to develop and test.

@danielbeach - yea, that sounds great. Any chance you could send a PR? I'll be happy to test, document in the README, and market. Thank you!

@MrPowers I tried to push a PR, but need access.

@danielbeach - sent you an invite to collab on the repo ;)

In the opening of the issue, I mentioned that I didn't find nice OSS examples of creating a reproducible local dev setup for contributors of a project. By coincident, I saw a talk from PyData Global 2022 which was recently uploaded to youtube on exactly that topic from one of the core Airflow devs. And it turns out that Airflow has invested a lot in what they call a breeze environment to cover everything from local dev and test to deployment. It is certainly overngineering for mack at this point but it has some nice insights and potential ideas that one can draw inspiration from. I leave the talk and Airflow Breeze docs here for future reference.

From the docs:

Airflow Breeze is an easy-to-use development and test environment using Docker Compose. The environment is available for local use and is also used in Airflow's CI tests. We call it Airflow Breeze as It's a Breeze to contribute to Airflow. The advantages and disadvantages of using the Breeze environment vs. other ways of testing Airflow are described in CONTRIBUTING.rst.