pangeo-forge/roadmap

What org to use to host Bakery Docker images

Closed this issue ยท 26 comments

Mainly a question for @rabernat

To ensure that our Bakeries can all provide the same dependencies and so that we can ensure registering a flow involves the same dependencies as the ones used to run the flow, we're looking to host the Bakery Worker images somewhere.

Ideally DockerHub would be great - would we be able to use the pangeo org already there, or shall we make a pangeo-forge org?

From conversations with @sharkinsspatial, the images should only really be pulled by our CI workflows to be used in registering flows. For actual Bakery usage, we'll be recommending that Bakery maintainers checkout the repo we'll be storing the definitions in, then building & pushing the images they're hoping to support to their cloud providers registry of choice (ECR/ACR/etc.)


I envision the flow to be something like:

  1. Recipe writer creates a PR in staged-recipes
    • The meta.yaml specifies the image that matches their pangeo-forge-recipes version and python version
      • This might look like pangeo-forge/bakery-worker:prefect_14-1-0_pangeo-forge-recipes_0-3-0_python_8-6 (Very much adlib-ing that right now)
  2. PR is approved (or moved to the next stage)
  3. CI uses the image specified in meta.yaml to use as the base image (from DockerHub) for registering the flow with the specified bakery
    • Registering also sets the value of worker_image inside the flow - This might require a mapping to the Bakeries equivalent image, eg:
    my_bakeries_images = {
      "pangeo-forge/bakery-worker:prefect_14-1-0_pangeo-forge-recipes_0-3-0_python_8-6" : "aws_account_id.dkr.ecr.us-west-2.amazonaws.com/worker:prefect_14-1-0_pangeo-forge-recipes_0-3-0_python_8-6"
    }
  4. Feedstock stuff happens & Bakery is setup with the final flow

If we get a case where the Recipe maintainer wishes to bump the version of pangeo-forge-recipes or python, we probably want that to be a PR onto the Feedstock/staged-recipes so that we can re-register the flow and make sure the Bakery has the required version on its end.

Also, should probably point out that the prefect version shouldn't really matter to the recipe maintainer, so maybe we don't include that in the tag? Not sure. @sharkinsspatial what are your thoughts?

Cheers!

Changed my mind including the python version - this is determined by the pangeo-notebook which we're going to be using as the base image.

The tag will therefore likely be something like pangeonotebook-<version>_prefect-<version>_pangeoforgerecipes-<version>

Thanks @ciaranevans for putting such careful though into this.

would we be able to use the pangeo org already there, or shall we make a pangeo-forge org?

Let's use the existing account.

  • The meta.yaml specifies the image that matches their pangeo-forge-recipes version

According to ADR 1, the meta.yaml already has to specify a pangeo_forge_recipes_version. Can we automatically translate this to the correct image name? I ask because I would prefer to avoid requiring recipe authors to know anything about Docker.

Also, should probably point out that the prefect version shouldn't really matter to the recipe maintainer, so maybe we don't include that in the tag?

I would prefer to leave this out of the name if possible if indeed it is not necessary.

According to ADR 1, the meta.yaml already has to specify a pangeo_forge_recipes_version. Can we automatically translate this to the correct image name? I ask because I would prefer to avoid requiring recipe authors to know anything about Docker.

Nice, I'd missed this to be fair, that might do for the recipe side.

I would prefer to leave this out of the name if possible if indeed it is not necessary.

For the recipe contributor, it isn't necessary, though it will be necessary to distinguish them for flow registration & bakery management, but like you noted above, really the deciding factor here is likely the pangeo-forge-recipes version. So it can be down to the bakery to handle which pangeo-forge-recipes and prefect combinations it has.

We might have to have something on the registration side that rejects a recipe into a bakery if it doesn't have a image that has pangeo-forge-recipes for the version specified...


On the account front - are you okay with the name bakery-worker for the repository on DockerHub? Or would we prefer pangeo-forge-bakery-worker? Likewise, is there a way I can be added to that org to start work on sorting out the worker repo? Cheers @rabernat !

So it can be down to the bakery to handle which pangeo-forge-recipes and prefect combinations it has.

And by this I mean, when registering, the bakery logic in there will hold a map of the pangeo-forge-recipes versions it's supporting, then that will map to the whole notebook-recipes-prefect style image tag

@ciaranevans it looks like we are maxed out on our free dockerhub account so can't add any new users. We could either

  • Upgrade to a paid plan (not opposed to it but want to consider all the options)
  • Create a new org (e.g. pangeo_forge) under a free account

Do folks have thoughts on whether we need the "pro" dockerhub features for this project?

And by this I mean, when registering, the bakery logic in there will hold a map of the pangeo-forge-recipes versions it's supporting, then that will map to the whole notebook-recipes-prefect style image tag

I think this sounds fine in general, but it would be great to record this decision in verbose detail via an ADR.

@rabernat I'm fine with a new free account, but I'll let others chime in on that front.

For sure, I think I'll compare notes with @sharkinsspatial and we'll draft an ADR so that this is fleshed out

@ciaranevans This approach sounds good to me. I agree that prefect versioning should not be a concern of recipe developers. If they are using the model of developing recipes in pangeo using the pangeo-notebook image then we should be able to support the combination of pangeo-notebook and pangeo-forge-recipe versions they have used during development. Given its version volatility, we can select the version of prefect we will use for flow registration.

While being able to select the worker image used by the bakery is straightforward during registration the more problematic issue is how to dynamically set container version used by the Github action for registration. I have tried several approaches for this but there are limitation within Github Actions about dynamically setting FROM values in the Dockerfile and the action's uses block is evaluated on workflow initiation and cannot access the workflow contexts. So I have opted to take a more restrictive versioning approach.

  1. We will maintain images with a pinned combination of pangeo-forge-recipes, pangeo-notebook and prefect in the bakery-docker-images repo. Tags from this repo will be pushed to Dockerhub. The recipe-prefect-action can be updated to use the most recent image from tag when desired. The staged-recipes repo and the feedstock template repo can be periodically updated to use the desired version of 'recipe-prefect-action` and its included lib versions.

  2. Bakery operators can periodically update the worker images in their Image Repositories to the desired version and can advertise the most recent provided version in the appropriate fields in bakeries.yaml. Older images will be maintained in these repositories so that older previously registered flows can always be re-run.

  3. A developer can develop a recipe using Ryan's approach of working in Binder using a version of pangeo-notebook to control dependencies and target a specific version of pangeo-forge-recipes.

  4. The meta.yaml for their recipe will contain the pangeo_forge_version and pangeo_notebook_version and the Github PR workflow will use pangeo-forge-prefect to verify that these versions match those in the action's executing container and that those versions also the ones currently advertised by the target bakery. If they do not the workflow fails and developers can coordinate with bakery operators to ensure that worker images with the appropriate versions are available (we should consider automating the new image registration process for bakery operators as well in the future).

Using this approach we are always ensuring that recipe developers and bakery operators are always using up to date releases while we tightly control version pinning to ensure functioning environments and prevent nightmare issues like this one.

I have all this implemented I'll try to update the workflow PR in staged-recipes with these changes tomorrow.

@rabernat On the question of a Free vs Pro Dockerhub account this revolves mainly on whether we would like to put the burden of container maintenance on bakery operators or not. Because of the limitations of Free Dockerhub accounts @ciaranevans and I had originally considered having bakery operators maintain there own image repositories to avoid rate limiting issues when creating large Dask clusters dynamically that might generate many simultaneous container pulls. If we can use a Pro account we could in theory manage the image repository centrally which would simplify things. I'd be curious to hear experiences from the core pangeo team if they have experienced rate limiting or latency with Dockerhub for larger clusters. If not, I would suggest getting a Pro account as this will simplify things a lot for bakery operators and make our flow registration process simpler. Once we have finalized a decision here I'll capture the outcome in a detailed ADR ๐Ÿ‘

Hey @rabernat have you had a chance to take a look at โ˜๏ธ ? We can always wait and discuss next Monday!

I'd be curious to hear experiences from the core pangeo team if they have experienced rate limiting or latency with Dockerhub for larger clusters.

The answer is no, because all our jupyterhubs and binders build derived containers and store them in the cloud-provider's private container registry in-region. So we only pull from docker hub once--when we upgrade a particular hub image--and then all the rest of the pulls (e.g. spinning up a big dask cluster) are from the cloud-provider's registry and are not rate limited. Coincidentally, we just started discussing bypassing this and going straight for dockerhub (jupyterhub/binderhub#1298), in which case we would confront the rate limitation issue.

Bottom line, $25 / month is a very small expense in the context of this project, and I'll happily sign us up for a paid account if it will help the project.

As a specific step forward, we would need to decide who would have an account on the paid Dockerhub org. Currently we have 7 members.

The basic pro subscription only allows 5, so we would need to remove at least 3 of these users (one extra in order to allow someone from Devseed to get an account.)

My impression is that very few of these users are still actively using their membership in this org. Could folks please confirm whether or not you still need admin access to the Pangeo dockerhub account?

@rabernat - would it be easier to carve out a pangeo-forge org?

I would rather not pay for two separate pro accounts. Assuming that the broader pangeo effort will eventually hit these limits as well, it makes sense to me to combine them.

Hi @rabernat - I don't need this admin access any more, so feel free to delete my membership!

Same here, feel free to remove me.

I don't need either.

I haven't used the pangeo account directly in months. Happy to give up my spot. It is possible (haven't checked) that some of our CI/CD systems are publishing via my credentials though so it may be worth removing me only if necessary.

Same for me too :) I also personally just use quay.io now for images instead of dockerhub.

i'd prefer to stay on as an admin for now, but not critical.

our CI/CD systems are publishing via my credentials though so it may be worth removing me only if necessary.

We should make sure all the pangeo CI workflows are using access tokens.

If building an image via github actions it's straightforward to push to multiple hosts (example for dockerhub and quay.io: https://github.com/uwhackweek/docker-template/blob/main/.github/workflows/CI.yml).

I have upgraded our Pangeo Docker subscription to pro with 5 seats. @ciaranevans + @sharkinsspatial: please let me know the Docker IDs you'd like to have added.

@rabernat I'm down as ciarandevseed

Ok you should be all set: https://hub.docker.com/orgs/pangeo

Wait a minute before building any new containers. I'm about to release pangeo-forge-recipes.

Looking at it, is there 1 more person due to go? There isn't room for @sharkinsspatial currently

I'm happy to add Sean (and remove Yuvi) if he tells me his Docker ID. But FWIW, I'm not certain that everyone involved needs to have a membership in this org.

As noted above by Scott, a good practice is to create access tokens for all automated pushing of images. You (Ciaran) now have owner privs and can create all the access tokens you need, which can be used either by CI or directly on the command line. The org can also be linked to GitHub via a service user