golang/go

cmd/go: git export-subst causes hash mismatches

jasonkeene opened this issue ยท 19 comments

What version of Go are you using (go version)?

1.11rc1

Does this issue reproduce with the latest release?

yeap

What operating system and processor architecture are you using (go env)?

see below

What did you do?

These are the steps I did on three different machines. You can see the hash is different on the Macbook Pro vs the iMac and Linux machine. All of these operations were done in a new $GOPATH and newly created module. I originally ran into this issue when doing go mod tidy on the iMac machine after committing the go.sum from the Macbook Pro.

Linux Workstation:

$ go1.11rc1 version
go version go1.11rc1 linux/amd64
$ uname -a
Linux theia 4.15.0-32-generic #35-Ubuntu SMP Fri Aug 10 17:58:07 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
$ go1.11rc1 env
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/pivotal/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/pivotal/workspace/repro-mod-issue/go"
GOPROXY=""
GORACE=""
GOROOT="/home/pivotal/sdk/go1.11rc1"
GOTMPDIR=""
GOTOOLDIR="/home/pivotal/sdk/go1.11rc1/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/pivotal/workspace/repro-mod-issue/mod/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build539505302=/tmp/go-build -gno-record-gcc-switches"

$ go1.11rc1 get k8s.io/client-go@v0.0.0-20180709172653-0ec73abb067f
go: finding k8s.io/client-go v0.0.0-20180709172653-0ec73abb067f
go: downloading k8s.io/client-go v0.0.0-20180709172653-0ec73abb067f
go: finding k8s.io/client-go v8.0.0+incompatible
go: downloading k8s.io/client-go v8.0.0+incompatible
$ cat go.sum
k8s.io/client-go v0.0.0-20180709172653-0ec73abb067f h1:0k3XNLIMLwDNdQdkviifMIGTGVAJSJYePselvFsqV8s=
k8s.io/client-go v0.0.0-20180709172653-0ec73abb067f/go.mod h1:7vJpHMYJwNQCWgzmNV+VYUl1zCObLyodBc8nIyt8L5s=
k8s.io/client-go v8.0.0+incompatible h1:2pUaSg2x6iEHr8cia6zmWhoCXG1EDG9TCx9s//Aq7HY=

iMac:

$ go version
go version go1.11rc1 darwin/amd64
$ uname -a
Darwin otis 17.7.0 Darwin Kernel Version 17.7.0: Thu Jun 21 22:53:14 PDT 2018; root:xnu-4570.71.2~1/RELEASE_X86_64 x86_64
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/pivotal/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/pivotal/workspace/repro-mod-issue/go"
GOPROXY=""
GORACE=""
GOROOT="/Users/pivotal/sdk/go1.11rc1"
GOTMPDIR=""
GOTOOLDIR="/Users/pivotal/sdk/go1.11rc1/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/pivotal/workspace/repro-mod-issue/mod/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/nl/f9hjjjhn0qq7lp917hnx05m00000gn/T/go-build367695573=/tmp/go-build -gno-record-gcc-switches -fno-common"

$ go get k8s.io/client-go@v0.0.0-20180709172653-0ec73abb067f
go: finding k8s.io/client-go v0.0.0-20180709172653-0ec73abb067f
go: downloading k8s.io/client-go v0.0.0-20180709172653-0ec73abb067f
go: finding k8s.io/client-go v8.0.0+incompatible
go: downloading k8s.io/client-go v8.0.0+incompatible
$ cat go.sum
k8s.io/client-go v0.0.0-20180709172653-0ec73abb067f h1:0k3XNLIMLwDNdQdkviifMIGTGVAJSJYePselvFsqV8s=
k8s.io/client-go v0.0.0-20180709172653-0ec73abb067f/go.mod h1:7vJpHMYJwNQCWgzmNV+VYUl1zCObLyodBc8nIyt8L5s=
k8s.io/client-go v8.0.0+incompatible h1:2pUaSg2x6iEHr8cia6zmWhoCXG1EDG9TCx9s//Aq7HY=

Macbook Pro:

$ go version
go version go1.11rc1 darwin/amd64
$ uname -a
Darwin wat.local 17.6.0 Darwin Kernel Version 17.6.0: Tue May  8 15:22:16 PDT 2018; root:xnu-4570.61.1~1/RELEASE_X86_64 x86_64
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/jasonkeene/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/jasonkeene/projects/repro-mod-issue/go"
GOPROXY=""
GORACE=""
GOROOT="/Users/jasonkeene/sdk/go1.11rc1"
GOTMPDIR=""
GOTOOLDIR="/Users/jasonkeene/sdk/go1.11rc1/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/jasonkeene/projects/repro-mod-issue/mod/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/f2/8qjhqmss5ssb6ccx15bxvvl80000gn/T/go-build890262066=/tmp/go-build -gno-record-gcc-switches -fno-common"

$ go get k8s.io/client-go@v0.0.0-20180709172653-0ec73abb067f
go: finding k8s.io/client-go v0.0.0-20180709172653-0ec73abb067f
go: downloading k8s.io/client-go v0.0.0-20180709172653-0ec73abb067f
go: finding k8s.io/client-go v8.0.0+incompatible
go: downloading k8s.io/client-go v8.0.0+incompatible
$ cat go.sum
k8s.io/client-go v0.0.0-20180709172653-0ec73abb067f h1:j4/k4PUx72J2958XS0i/rAn6JAaoi1v48mLEaY8QGzM=
k8s.io/client-go v0.0.0-20180709172653-0ec73abb067f/go.mod h1:7vJpHMYJwNQCWgzmNV+VYUl1zCObLyodBc8nIyt8L5s=
k8s.io/client-go v8.0.0+incompatible h1:7Zl+OVXn0bobcsi4NEZGdoQDTE9ij1zPMfM21+yqQsM=

What did you expect to see?

The hashes should match. This is the same git SHA for k8s.io/client-go and same pseudo-version.

What did you see instead?

The hashes were different, resulting in a failed go mod tidy and go mod verify.

One difference between the Macbook Pro and iMac/Linux is that I installed go1.11rc1 the day before on the Macbook Pro. The other two machines I installed go1.11rc1 today. I installed using go get golang.org/dl/go1.11rc1 on all three machines.

@gopherbot, please add label modules

Can you also list the git version on each machine, as well as any other git configs you have on the two?

Sure thing:

Linux Workstation:

$ git version
git version 2.17.1
$ git config --list
push.default=simple
alias.blog=log origin/master... --left-right
alias.br=branch
alias.ci=duet-commit
alias.co=checkout
alias.dc=diff --cached
alias.di=diff
alias.ds=diff --staged
alias.fetch=fetch --all --prune
alias.fix-remote=! f() { export remote=$(git remote get-url origin --push); if [ -z ${remote##https://github.com/*} ]; then git remote set-url origin --push "git@github.com:${remote#https://github.com/}"; fi; unset remote; }; f
alias.fixup=commit --fixup
alias.flog=log --pretty=fuller --decorate
alias.llog=log --date=local
alias.lol=log --graph --decorate --oneline
alias.lola=log --graph --decorate --oneline --all
alias.p=pull --rebase --autostash
alias.rum=rebase master@{u}
alias.squash=commit --squash
alias.st=status
alias.sta=stash
alias.sur=submodule update --init --recursive
alias.unstage=reset HEAD
user.name=Jason Keene
user.email=[redacted]
duet.env.git-author-initials=jk
duet.env.git-author-name=Jason Keene
duet.env.git-author-email=[redacted]
duet.env.mtime=1534880165
duet.env.git-committer-initials=
duet.env.git-committer-name=
duet.env.git-committer-email=
core.hookspath=/home/pivotal/workspace/git-hooks-core

iMac:

$ git version
git version 2.18.0
$ git config --list
credential.helper=osxkeychain
hooks.global=/usr/local/share/githooks
push.default=simple
alias.blog=log origin/master... --left-right
alias.br=branch
alias.ci=duet-commit
alias.co=checkout
alias.dc=diff --cached
alias.di=diff
alias.ds=diff --staged
alias.fetch=fetch --all --prune
alias.fix-remote=! f() { export remote=$(git remote get-url origin --push); if [ -z ${remote##https://github.com/*} ]; then git remote set-url origin --push "git@github.com:${remote#https://github.com/}"; fi; unset remote; }; f
alias.fixup=commit --fixup
alias.flog=log --pretty=fuller --decorate
alias.llog=log --date=local
alias.lol=log --graph --decorate --oneline
alias.lola=log --graph --decorate --oneline --all
alias.p=pull --rebase --autostash
alias.rum=rebase master@{u}
alias.squash=commit --squash
alias.st=status
alias.sta=stash
alias.sur=submodule update --init --recursive
alias.unstage=reset HEAD
user.name=Jason Keene
user.email=[redacted]
duet.env.git-author-initials=jk
duet.env.git-author-name=Jason Keene
duet.env.git-author-email=[redacted]
duet.env.mtime=1534880165
duet.env.git-committer-initials=
duet.env.git-committer-name=
duet.env.git-committer-email=

Macbook Pro:

$ git version
git version 2.10.0
$ git config --list
core.excludesfile=~/.gitignore
core.legacyheaders=false
core.quotepath=false
core.pager=less
mergetool.keepbackup=true
push.default=simple
color.ui=auto
color.interactive=auto
repack.usedeltabaseoffset=true
alias.s=status
alias.a=!git add . && git status
alias.au=!git add -u . && git status
alias.aa=!git add . && git add -u . && git status
alias.c=commit
alias.cm=commit -m
alias.ca=commit --amend
alias.ac=!git add . && git commit
alias.acm=!git add . && git commit -m
alias.l=log --graph --all --pretty=format:'%C(yellow)%h%C(cyan)%d%Creset %s %C(white)- %an, %ar%Creset'
alias.ll=log --stat --abbrev-commit
alias.lg=log --color --graph --pretty=format:'%C(bold white)%h%Creset -%C(bold green)%d%Creset %s %C(bold green)(%cr)%Creset %C(bold blue)<%an>%Creset' --abbrev-commit --date=relative
alias.llg=log --color --graph --pretty=format:'%C(bold white)%H %d%Creset%n%s%n%+b%C(bold blue)%an <%ae>%Creset %C(bold green)%cr (%ci)' --abbrev-commit
alias.d=diff
alias.master=checkout master
alias.spull=svn rebase
alias.spush=svn dcommit
alias.alias=!git config --list | grep 'alias\.' | sed 's/alias\.\([^=]*\)=\(.*\)/\1\     => \2/' | sort
include.path=~/.gitcinclude
include.path=.githubconfig
include.path=.gitcredential
diff.exif.textconv=exif
credential.helper=osxkeychain
user.name=Jason Keene
user.email=[redacted]
user.signingkey=[redacted]
core.pager=less -FRSX
core.excludesfile=~/.gitexclude
color.ui=auto
alias.lg=log --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)<%an>%Creset' --abbrev-commit --date=relative
alias.lga=log --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)<%an>%Creset' --abbrev-commit --date=relative --all
alias.lol=log --graph --pretty=format:"%C(yellow)%h%Creset%C(cyan)%C(bold)%d%Creset %C(cyan)(%cr)%Creset %C(green)%ce%Creset %s"
alias.lulz=log --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)<%an>%Creset' --abbrev-commit --date=relative --all
alias.co=checkout
alias.ci=commit
alias.aa=add --all
alias.st=status
alias.di=diff
alias.dc=diff --cached
alias.pu=pull --ff-only
alias.mm=merge master
alias.fa=fetch --all --prune
alias.pom=push origin master
diff.tool=diffmerge
merge.tool=diffmerge
mergetool.keepbackup=false
mergetool.prompt=false
difftool.diffmerge.cmd=diffmerge "$LOCAL" "$REMOTE"
mergetool.diffmerge.cmd=diffmerge --merge --result="$MERGED" "$LOCAL" "$(if test -f "$BASE"; then echo "$BASE"; else echo "$LOCAL"; fi)" "$REMOTE"
mergetool.diffmerge.trustexitcode=true
difftool.kdiff3.path=kdiff3
difftool.kdiff3.trustexitcode=false
mergetool.kdiff3.path=kdiff3
mergetool.kdiff3.trustexitcode=false
difftool.Kaleidoscope.cmd=ksdiff --partial-changeset --relative-path "$MERGED" -- "$LOCAL" "$REMOTE"
difftool.prompt=false
mergetool.Kaleidoscope.cmd=ksdiff --merge --output "$MERGED" --base "$BASE" -- "$LOCAL" --snapshot "$REMOTE" --snapshot
mergetool.Kaleidoscope.trustexitcode=true
push.default=simple
rasky commented

Can you compare the zip files in the module cache to see if they are actually different?

Looks like the SHAs are different:

Linux Workstation:

$ shasum v0.0.0-20180709172653-0ec73abb067f.zip
80ed94360887f7496f56c2cdcbde7ae84369eeb8  v0.0.0-20180709172653-0ec73abb067f.zip

iMac

$ shasum v0.0.0-20180709172653-0ec73abb067f.zip
80ed94360887f7496f56c2cdcbde7ae84369eeb8  v0.0.0-20180709172653-0ec73abb067f.zip

Macbook Pro:

$ shasum v0.0.0-20180709172653-0ec73abb067f.zip
450f9fde79cf9fefe6821d41bebebeb25da17430  v0.0.0-20180709172653-0ec73abb067f.zip

I then unpacked the zips and did a diff:

$ diff -r linux/k8s.io/client-go\@v0.0.0-20180709172653-0ec73abb067f/ macbook-pro/k8s.io/client-go\@v0.0.0-20180709172653-0ec73abb067f/
diff -r linux/k8s.io/client-go@v0.0.0-20180709172653-0ec73abb067f/pkg/version/base.go macbook-pro/k8s.io/client-go@v0.0.0-20180709172653-0ec73abb067f/pkg/version/base.go
58c58
<       gitVersion   string = "v0.0.0-master+0ec73abb"
---
>       gitVersion   string = "v0.0.0-master+0ec73ab"

This hash fragment is a format string in the source code but apparently gets replaced when git archive is ran.

https://github.com/kubernetes/client-go/blob/3a923191144267df15e72beacff80dda1773fe87/pkg/version/base.go#L55-L58

Indeed when I use git log to generate this format it is different. Likely a change in git.

Linux Workstation

$ git log --pretty=%h 0ec73abb067f -1
0ec73abb

Macbook Pro

$ git log --pretty=%h 0ec73abb067f -1
0ec73ab

This is great that the hash is doing its job and pointing out source code differences. I for sure will upgrade the outdated git. I'm wondering if this is the desired end user behaviour though. Shouldn't go get ...@hash get the same source code? It seems very odd to get different source code because different minor versions of git happen to be installed.

Is there a .gitignore file in the MacBook pro zip archive?

Nope, no .gitignore. The only difference is the gitVersion string that is modified by git archive which apparently the go module tooling invokes.

Perhaps this is a data point for #26746 (or perhaps this ends up being something solved with different flags passed to git, or _____).

Just adding a couple of snippets (slightly expanding out what @jasonkeene commented on):

https://github.com/kubernetes/client-go/blob/3a923191144267df15e72beacff80dda1773fe87/pkg/version/base.go#L55-L58

	// NOTE: The $Format strings are replaced during 'git archive' thanks to the
	// companion .gitattributes file containing 'export-subst' in this same
	// directory.  See also https://git-scm.com/docs/gitattributes
	gitVersion   string = "v0.0.0-master+$Format:%h$"
	gitCommit    string = "$Format:%H$" // sha1 from git, output of $(git rev-parse HEAD)
	gitTreeState string = ""            // state of git tree, either "clean" or "dirty"

And from https://git-scm.com/docs/gitattributes:

export-subst
If the attribute export-subst is set for a file then Git will expand several placeholders when adding this file to an archive. The expansion depends on the availability of a commit ID, i.e., if git-archive[1] has been given a tree instead of a commit or a tag then no replacement will be done. The placeholders are the same as those for the option --pretty=format: of git-log[1], except that they need to be wrapped like this: $Format:PLACEHOLDERS$ in the file. E.g. the string $Format:%H$ will be replaced by the commit hash.

Just a heads up, I am going to be working on this issue today at Gophercon during the contributor's workshop.

I'm not sure that this is actually a problem for cmd/go to solve.

export-subst should be fine as long as the substitutions are deterministic, well-defined, and stable across platforms and git versions. If a particular repository has an export-subst configuration that is inherently not deterministic or not stable, that seems like an issue to file against the owner of that repository.

If the main source of instability is git's abbreviation algorithm, is there some way to specify abbreviation parameters explicitly, or to disable abbreviation altogether?

I'm not sure export-subst is something that go modules should enable.
Here is my reasoning:

export-subst is a git attribute that allows for replacing certain format
strings when git archive is invoked. The result is source code that is
different than what would be in the working directory of the repo. Something
like this:

gitVersion   string = "v0.0.0-master+$Format:%h$"

is turned into this:

gitVersion   string = "v0.0.0-master+0ec73abb"

This seems like behaviour that is undesierable. If I run:

go get module@sha1

I would expect to get the exact same source code for that SHA1. Instead I get
a mutated version of the source code. We have seen issues with the format
strings not being consistent between git versions. Even something like the
amount of objects located in the repo can change the results of export-subst.

The situation that I ran into was k8s.io/client-go populating version
information into the source code. This string can be populated by the builder
via ldflags. Alternatively, version information can be read from the binary
with something like rsc.io/goversion. export-subst is not needed to get
version information into the binary.

Possbile solutions:

Disable export-subst attribute

This can be achieved by adding the following to .git/info/attributes before
doing the git archive:

* -export-subst

This disables the option for the whole repo. Adding the attribute in this way
has the highest precedence and can not be overrode by attributes in
.gitattributes files or global configuration. I have been working on a CL
that does this.

Disable all export attributes

This would include the export-ignore attribute. This attribute ignores
certain paths when creating the .zip. This can be done by adding the
following to .git/info/attributes.

* -export-subst -export-ignore

To be clear, export-ignore would likely not cause the ziphash to be
different but it would cause the contents of the zip file to be different from
the source code that is in the repo.

Require a minimum version of git that is after the change to %h

This option is not valid as the format strings for export-subst change with
variables outside of the git version.

Do nothing

This would require all git repos that currently have these features enabled
(likely for valid reasons) to stop using these features if they want to get
consistent ziphashes. Naturally, folks will not know to do this until they run
into this issue, causing them to at best do the research to find why this is
occuring or more likely just ignore validation warnings. Training users to
ignore these warnings seems like a bad idea.

I think a decision needs to be made if modules should use the contents of the
repo as they are or if modules should apply extra processing to the source code
via git archive and whatever other features are in other version control systems.

I agree that for the sake of reproducibility and consistency we should use the source as it is checked out, not depend on a post-processing step that is both uncommon and undeterministic, so I'd favor * -export-subst -export-ignore. (And I believe this is a git misfeature we should opt out of.)

It's a bit unfortunate that the GitHub-generated zip files will differ, but it's my understanding that we don't use those anymore?

Thanks for the detailed analysis. I think you both make a good point that the source in the module archive should be the same as the source in the checked-out source tree.

I also favour the ignore option. Is there any way that Go programs can introspect version information? That would be ideal, because the version is often something that's served through an API.

Is there any way that Go programs can introspect version information?

That's #26404.

Change https://golang.org/cl/135175 mentions this issue: cmd/go: ensure git attributes are set when creating zips

@gopherbot, please backport to 1.11.2: this issue introduces invalid go.sum hashes, which will require manual intervention for future builds.

Backport issue(s) opened: #28094 (for 1.11).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.