golang/go

cmd/go: stamp git/vcs current HEAD hash/commit hash/dirty bit in binaries

bradfitz opened this issue Β· 51 comments

(Related but different than #35667)

cmd/go currently embeds all the module dep information in binaries and it's readable with e.g. https://godoc.org/rsc.io/goversion/version but it does not include any information about the top-level module's version.

I propose that cmd/go look at {git,svn,etc} state and include in the binary:

  • HEAD commit time
  • HEAD hash
  • dirty bit (if there are uncommitted changes)

Currently many projects do this by hand with a build-program.sh and stamping it manually with --ldflags=-X foo=bar, but that means programs built the normal Go way lack that information, and people end up with non-portable (shell, often) build scripts.

I've hit this enough times with my own projects that it's actively frustrating me. It's worse when programs are clients that want to report their version number to a server (which might want to do analytics, build horizon enforcement, protocol version negotiation, etc) and then can't. There are alternative ways to do all that, but they're tedious.

Mostly I'm concerned that people have bespoke, often non-portable build scripts.

Note, July 15, 2021: This proposal was accepted but has not yet been implemented. Per discussion in #37693, when it does get implemented, we need to include a flag to turn this behavior off. - rsc

This may be a duplicate of #29814.

rsc commented

We should figure out exactly what we want to record.
The other modules are all versions or pseudo-versions.
Should this one be too?
Can we find that quickly enough to be reasonable to run during every 'go build'?
(We can skip it during 'go test' like we skip dwarf.)
The go command already has code to turn a commit hash into a version; we should probably just use that same code and add a +modified if the working directory is modified.

It would be helpful to time how much overhead this would be in 'go build'.

@bcmills, do you have any numbers about how much time this would add?

rsc commented

BTW I agree it's a duplicate of #29814 but I'll keep using this one because it is marked as a proposal and already appeared in the minutes.

It would be helpful to time how much overhead this would be in 'go build'.

git status --porcelain=v2 in the go repository is around 50ms for me, and git log -n 1 is around 25ms.

go install cmd/go for me is 1.6s dirty, and 140ms clean. So assuming that we can run the VCS commands in parallel with builds for non-main packages, the latency hit should be negligible.

CC @jayconrod @matloob

Hmm, I realized that I didn't account for checking tags in the above calculations. Still, I expect those costs will be order-of-magnitude similar to any other git command.

I'd like to point out a (maybe small) problem with this approach: changing the version of source code, but not the code itself will cause the binary to change.

Let me explain the use-case I have that will be broken by this change:

  • A number of binaries are built from a monorepo, each binary is packaged in Docker, and a Helm chart is generated for every binary. This chart includes Docker image ID, not the tag.
  • These Helm charts are installed to Kubernetes

If the Helm chart contents and the binaries don't change, then no upgrades are performed by Kubernetes.

If the version of the checkout is stamped into every Go binary, then this scheme crumbles and:

  • either a version needs to be purged from every binary before packaging it into a Docker image,
  • or some crazy scheme has to be invented: remove the version from the binary, take a checksum, check that there is no Docker image with this version as a label published in the registry, publish the image and label it, use this label in Helm charts.

@dottedmag, what if we made it conditional on importing a new package, say runtime/version, containing the accessor funcs to get at the info? Then if you don't use it, no change in behavior.

Would that work for your use case?

I'm in a practically identical case to @dottedmag.

Inevitably, somewhere in the monorepo we will (perhaps unintentionally) bring in a dependency that depends on a magic package that breaks deterministic builds.

I think for most common cases, having this proposal enabled by default would be preferable. For my use case, I would be satisfied if there was a documented way to opt out of it. We already are using -ldflags=-buildid= to force a consistent build ID as part of deterministic binaries, so another esoteric flag to opt out of recording VCS state would be completely acceptable.

@dottedmag, what if we made it conditional on importing a new package, say runtime/version, containing the accessor funcs to get at the info? Then if you don't use it, no change in behavior.

I agree with @mark-rushakoff: relying on imports will be brittle unless this import is considered only for main packages.

It's not an author of some recursively included library, but a builder of a final binary who in a position to decide whether to put versioning information into the binary or not.

@dottedmag, note that many functionally-equivalent builds will already produce slightly different binaries due to the version-stamping for runtime/debug.ReadBuildInfo. The version stamps will change with each change to the corresponding module versions, even if the contents of the specific imported packages are the same.

This proposal would case more of the same sort of version churn, but it is fundamentally the same churn.

That suggests that we may want to provide an option to disable version stamping in general. IMO, that should be a separate proposal.

True. In practice, it is not a problem as changing the versions of dependencies nearly always changes the code of dependencies β€” nobody is updating versions of dependencies endlessly for no reason, usually, they only get updated to get a new feature or a bugfix.

Filed #37693.

rsc commented

The discussion above about reproducible builds sounds like it would be satisfied by having the version embedded by default but also having an opt-out command-line flag; no special package needed.

Do I have that right, @dottedmag and @mark-rushakoff?

Yes, I think a flag to opt out of embedding version details would suffice for reproducible builds.

It would be nice if there was a single flag like -reproducible to omit version details and to set a fixed build ID, but it is probably fine if those remain separate concerns.

I don't care about reproducible builds when I'm at the command line building something for my own use; I care about reproducible builds when I am writing build scripts that run as part of a CI/CD pipeline, so it is not a big deal if I need to look up the whole collection of settings to make those builds reproducible.

@rsc Correct.

rsc commented

OK, it sounds like everyone agrees about doing this by default, with a flag to turn it off.
It is unclear exactly how fast it will be to ask git/etc what we need to know, but that operation can overlap with the entire build, including the link. Right now if I build helloworld I get:

/Users/rsc/go/pkg/tool/darwin_amd64/link -o $WORK/b001/exe/a.out -importcfg $WORK/b001/importcfg.link -buildmode=exe -buildid=SqhXBrEZODPkt1gG-6fj/H70nrokRrHQB9NgKyehx/mN9hM1-9avNnPeQfRwgY/SqhXBrEZODPkt1gG-6fj -extld=clang $WORK/b001/_pkg_.a
/Users/rsc/go/pkg/tool/darwin_amd64/buildid -w $WORK/b001/exe/a.out # internal

That buildid step could install the git version info too. I'm confident git will be faster than the link.

Based on the discussion, then, this seems like a likely accept, although we may not be able to implement it until the next release (Go 1.16).

Will this include just the commit hash or also a (any?) version tag

Version tags introduce many sharp edges, at least for git… since you can have a git repo cloned without having fetched all tags, or can have different local tags, or can add a tag to a SHA at any point in time, using the nearest tag (e.g. git describe or equivalent) would mean that a build of the same SHA could result in different embedded versions for different people (or for the same person at different times). Further, if the closest tagged commit has multiple tags associated with it, the version could be ambiguous (git describe has particularly unpredictable behavior in this case).

@liggitt, note that that problem only occurs in one direction. (It is fine in general to have N names for one commit. The important property is that we resolve only one commit for a given name.)

The go command already has an algorithm for inferring the canonical name for a given repo state, implemented as part of #27171. (Specifically, we infer either the highest tagged semantic version appropriate to the module path, or a pseudo-version derived from the highest-tagged ancestor.)

I realize that that algorithm makes things awkward for the k8s.io repos in particular because of their unusual tagging history (in which v1.5.1 is an ancestor of v0.18.0), but from what we've observed that sort of nonlinear history is an outlier.

The important property is that we resolve only one commit for a given name.

Since tags can be local, preferring a tag over a sha would mean two users could have identical local tags associated with different SHAs. Or are you suggesting the mapping of shas to tags would not be determined by consulting the local VCS, but the remote canonical VCS?

That seems like a reasonable motivation to include both the commit hash and semantic version, rather than just one or the other.

rsc commented

No change in consensus, so accepted. We may still need to work out exactly what to include, but everyone seems to agree that this is worth doing (barring some discovery about it being more expensive than we think).

Tentatively marking this for 1.16. We'll try to get this in, but we have a lot planned, and I can't promise it will get done. If someone is interested in working on this, I'll sketch out what needs to be done. This is complicated though, so probably not a good first issue.

  • Most of our VCS code for fetching modules is in cmd/go/internal/modfetch/codehost. It's fairly specific to modules and not intended to work with local repositories. Not sure whether it would be better to generalize that package and add support there or in a new package. I'd start with the former and see how complicated it gets.
    • In any case, there should be something the rest of cmd/go can call that returns VCS information.
    • That should support git, hg, svn, bzr, and fossil.
  • This information should be saved in cmd/go/internal/load.PackageInternal.BuildInfo. We use that field to store module information that's stamped into main packages. Since it would no longer be module-specific, it would need to be generalized.
    • Something in cmd/go/internal/load should populate that field for main packages, combining information with modload.PackageBuildInfo. This is currently done in Package.load. Note that different main packages in the same build may belong to different repositories.
    • BuildInfo is already part of the package cache key, so when it's changed, it would already cause cached main packages to become stale. We should make sure to test that though.
    • The code that does the stamping in cmd/go/internal/work.Builder.build is module-specific and would need adjustment.
  • runtime/debug.BuildInfo will need new fields, and runtime/debug.ReadBuildInfo will need to populate them.
  • cmd/go/internal/version will also need to be able to read and report the new information. Not clear to me if this should be printed by default or behind a flag (similar to how module information is behind -m). Probably the latter.
  • In #39301, we should move the code that extracts build information to x/mod. That would need to support this as well.
rsc commented

Added to top comment:

Note, July 15, 2021: This proposal was accepted but has not yet been implemented. Per discussion in #37693, when it does get implemented, we need to include a flag to turn this behavior off. - rsc

Change https://golang.org/cl/353930 mentions this issue: cmd/go: stamp VCS revision and uncommitted status into binaries

Change https://golang.org/cl/353885 mentions this issue: encoding/binary: convert internal test to external

Change https://golang.org/cl/353886 mentions this issue: cmd/go: move module build info formatting into runtime/debug

Change https://golang.org/cl/353887 mentions this issue: runtime/debug: add ReadBuildInfoFrom and ReadBuildInfoFromFile

Change https://golang.org/cl/353929 mentions this issue: cmd/go: migrate 'go version' to use debug.ReadBuildInfoFromFile

Change https://golang.org/cl/353888 mentions this issue: runtime/debug: add GoVersion to BuildInfo

Change https://golang.org/cl/355493 mentions this issue: cmd/go: stamp tags and flags in build info

Change https://golang.org/cl/356012 mentions this issue: debug/buildinfo: fix test for build settings

Change https://golang.org/cl/356013 mentions this issue: doc/go1.18: add release notes for build and VCS info

Change https://golang.org/cl/356014 mentions this issue: cmd/go: don't stamp build or vcs info for GOROOT binaries

mvdan commented

Thank you so much, @jayconrod! An excellent gift to us all before your sabbatical :)

Change https://golang.org/cl/356209 mentions this issue: cmd/go: use portable flags in TestScript/version_build_settings

Change https://golang.org/cl/356251 mentions this issue: cmd/go: stamp VCS commit time into binaries

mpx commented

The proposal also includes the commit time. Please re-open this issue to add it.

I've found including commit time is extremely useful since the revision/hash doesn't give any indication of code age or sequencing. Including the commit time allows people/code to:

  • Roughly order binaries
  • Quickly recognise what code is likely included without needing to look up a hash in a repo.

To make it sure it does not fall through the cracks: commit time should optional as well, to support reproducible builds for those who needs them without having to jump trough the hoops.

mpx commented

@dottedmag The extra VCS details (revision, commit time, dirty) can be disabled with -buildvcs=false.

Commit time is the time of the stated commit, not the time of the build, so it should be consistent between two builds at the same commit.

Change https://golang.org/cl/357955 mentions this issue: cmd/go: stamp Fossil VCS status into binaries

Change https://golang.org/cl/357956 mentions this issue: cmd/go: stamp Bazaar VCS status into binaries

mpx commented

The current implementation prepends the VCS name to "revision", "uncommitted", and "committime" (soon).

This makes it more complicated to consume since users need to check numerous fields for the same data, or check for suffixes. I can't see any advantage?

I think it would be better to use "revision", "uncommitted", "committime" universally, and add a "vcs" field for "git", "fossil", "mercurial", etc..

@mpx, a guess, but maybe the issue is that the built project could be in multiple repositories at the same time?

mpx commented

@egonelbre Good point, that would leave the option open to tag for all found VCS in the current directory. Currently, the code only tags the first VCS found.

If there is no desire to support this use case, then it's probably still better to simplify the tag names?

@mpx, good suggestion about the field names. Filed as #49168 (for Go 1.18).

Change https://golang.org/cl/373875 mentions this issue: doc/go1.18: add section for runtime/debug changes

Change https://go.dev/cl/391803 mentions this issue: debug/buildinfo: use testenv.GoToolPath instead of resolving "go" from $PATH

Change https://go.dev/cl/391809 mentions this issue: cmd/go: stamp build settings for binaries in cmd

andig commented

That seems like a reasonable motivation to include both the commit hash and semantic version, rather than just one or the other.

@bcmills I would read this as "both commit hash and tag" will be implemented, or rather are since the PR is merged. Looking at the buildinfo from 1.18, only the commit hash is present.

Is there further discussion or efforts for adding the (most recent) tag?