pangeo-data/pangeo-stacks

add git commit hash to docker tag?

Opened this issue · 12 comments

Currently we are tagging our images based on calver: CALVER="$( date '+%Y.%m.%d' )". This could lead to problems if we need to push multiple changes per day.

What if the tags were instead like our helm charts: 19.03.09-a0475df etc?

What I proposed (and actually implemented) on our helm chart is commit hash for every PR, and without commit hash upon tag, so for "stable" releases.

That sounds perfect! Can we do the same thing here?

I'm happy to see the convention updated as you all think best. I do want to make sure we maintain the latest tag as well but other than that, I'm keen to follow your lead.

How would we implement the commit hash idea? Can someone link to the code that does this in the helm chart? I couldn't find it.

I think the helm chart is using the TRAVIS_COMMIT_RANGE env variable. We could do the same.

I love using commit hash for this, since then you can always easily tell 'so what are we really running?'. I think adding the date is also a good human readable touch, so we should do CALVER-.

You should use the first 6-7 characters of git rev-parse HEAD - this is the commit hash of the current commit you are building on. If you wanna base it instead on the last time a particular directory (so image) was changed, you can do git log --pretty=format:'%h' -n 1 <directory-name> instead. However, in this case now that #31 has landed, you need to do inheritance checking - if base-image is modified, that needs to trigger a commit hash change for everything else downstream from there...

There's a golang YAML parser quirk that causes buggy annoying behavior if all the parts of a truncated commit hash are numbers. So you should have code that makes sure your truncated hash is not all numbers - just include more chars until it isn't.

You can find code that implements all this in https://github.com/yuvipanda/hubploy/blob/master/hubploy/gitutils.py

On the helm-chart, this is mainly done through chartpress, see https://github.com/pangeo-data/helm-chart/pull/85/files.

We first run chartpress to populate the version with CALVER, and then rerun it to add the hash upon deploy if not on a git tag, else force only CALVER if on a tag.

@yuvipanda could we do something simple only using git rev-parse HEAD ? Is there duplicated machinery between chartpress and hubploy?

@guillaumeeb git rev-parse HEAD means it'll change whenever the repo changes (for a README change, for example) rather than whenever the image itself changes. The code is duplicated between hubploy and chartpress, I think I literally copy pasted it :D It's only a one-liner tho so...

And your code implements the inheritance checking too?

@guillaumeeb ah, no it does not :) However, if you have other code that automatically updates FROM tags across an inheritance (since everything is explicitly versioned, you'd have to do this), then the code I have should work automatically.

So a few questions on how to implement this:

  • Should we copy past your code into the local build.py? Or add hubploy to the requirements?
  • Currently, I'm under the impression we build and push the image event if we only modify the Readme from the repo, am I right or did I miss something? Shouldn't we avoid that too?
  • We need the comit hash both during build (python script), and during deploy (bash script). Should we save the commit hash somewhere in a file during build time? Should we move the deploy machinery into a common python script?

Any thoughts, @jhamman or @yuvipanda ?