cmd/go: speed up 'go run' by caching binaries
eliasnaur opened this issue · 20 comments
What did you do?
With GO111MODULE=on and the Android SDK and NDK set up, this is the one-liner for creating an Android .apk app file for a Gio program:
$ go run gioui.org/cmd/gogio -target android gioui.org/apps/gophers
What did you expect to see?
go run
running nearly as fast as a pre-installed command, that is
$ go install gioui.org/cmd/gio
$ $GOBIN/gio -target android gioui.org/apps/gophers
What did you see instead?
@mvdan filed an issue that pointed out that while convenient, the command above is slowed down by go run
always re-linking gioui.org/cmd/gio.
This issue is about caching the binary from go run
so it achieves nearly the same speed for its second and subsequent runs as a pre-installed command.
Some slowdown is expected because go run
needs to know whether a cached version is the newest. I expect that delay to be minimal with a proper GOPROXY set.
Gio issue #15 also points out that teaching users to use go run
is bad, but I believe there are valid reasons to use go run
:
- It's more convenient. One line instead of (at least) two.
- The user doesn't need to know about $GOBIN ($GOPATH/bin). And even if they do know about it,
go run
doesn't pollute a user's $GOBIN if they just wanted to test or demo a command from a README. - Experienced Go users can easily translate a
go run
command to itsgo install
equivalent if they prefer. - Binary always up to date. I still regularly change the Gio library such that an updated cmd/gio is required. With
go run
the latest version is always used. - Avoids binary name clashes. It just so happens that
gio
already exists on my system (I believe it is a Gnome tool).
@eliasnaur Be advised that your Gio issue #15
links to #15
here in this repo. Gio issue 15 is, of course, here.
#25416 seems like the same issue. The rationale for closing it was that we'd prefer not to cache linked binaries, since they take up a lot of space, and in the case of go run
and go test
, the cache hit rate isn't all that high.
It might make sense to cache binaries if the cache eviction policy were more aggressive for binaries in particular. The cache would need to be a lot smarter though.
Yeah, it seems like instead of caching go run
, one's readme should use go build && ./the-executable
. As near as I can tell from go build -x
, go build
is smart enough to know that if nothing has changed, it doesn't do anything, and the binary is clearly right there to run again if you want it, and also clearly taking up space if you want to delete it.
In the particular case of Gio's gio command, I think it makes sense to go install
it, instead of go run
ning it every time.
The resolution from last time was #25416 (comment) (amplified in #25416 (comment)).
Note that, even with binary caching, go run
would still be substantially slower than running the installed binary directly from $GOBIN
, since go run
would still need to inspect all of the relevant files and directories to see whether the sources have changed.
And even if they do know about it,
go run
doesn't pollute a user's $GOBIN if they just wanted to test or demo a command from a README.
Note that one can always set GOBIN=$(mktemp -d)
to demo a command from a readme, or use go build -o
and pass an explicit binary destination.
Experienced Go users can easily translate a
go run
command to itsgo install
equivalent if they prefer.
I think that point also runs in the opposite direction, and more strongly: experienced Go users can easily translate a go install
command to a go run
command too, and new users are already confused about when go run
should or should not work. We should teach new users about go build
, go install
, and GOBIN
as early as we reasonably can, and package install instructions should normalize the use of those rather than go run
.
I still regularly change the Gio library such that an updated cmd/gio is required. With
go run
the latest version is always used.
When working within a module in module mode, go run
should produce a reproducible result, not always upgrade to the latest version. And when working outside of a module, it's not obvious whether go run
of a specific package should work at all (see #32027).
(Also note that this point is in direct tension with binary caching: checking for the latest version is an expensive operation. If we assume that go run
runs the latest version, then the relative speedup from caching the binary is substantially reduced.)
Note that, even with binary caching,
go run
would still be substantially slower than running the installed binary directly from$GOBIN
, sincego run
would still need to inspect all of the relevant files and directories to see whether the sources have changed.
What files and directories? The source files for gioui.org/cmd/gio and its dependencies are only stored in the cache, which is read-only and in a known state, right?
Note that one can always set GOBIN=$(mktemp -d) to demo a command from a readme, or use go build -o and pass an explicit binary destination.
What can I tell Windows users?
Now that I think about it, perhaps what I like most about "go run" is that it is a simple cross platform way to run Go binaries regardless of environment variables. A "go run" variant for running (cached) binaries from $GOBIN would suffice. I could program my way out of checking version mismatches between the gio tool and the gio packages.
The source files for gioui.org/cmd/gio and its dependencies are only stored in the cache, which is read-only and in a known state, right?
I don't think we've currently baked any assumptions about the pristineness of the module cache into the build-caching logic. You're right that we could, though.
But we'd still have to at least check the go.mod
file to ensure that the module configuration hasn't changed, and that means checking for the go.mod
file, which is a not-entirely-trivial directory walk.
What can I tell Windows users?
You could give separate instructions for cmd.exe
and for Unix-like shells. Or assume that they have their PATH
configured appropriately (perhaps by reference to some other document) and tell everyone:
go install gioui.org/cmd/gio
gio -target android gioui.org/apps/gophers
I still regularly change the Gio library such that an updated cmd/gio is required. With
go run
the latest version is always used.When working within a module in module mode,
go run
should produce a reproducible result, not always upgrade to the latest version. And when working outside of a module, it's not obvious whethergo run
of a specific package should work at all (see #32027).(Also note that this point is in direct tension with binary caching: checking for the latest version is an expensive operation. If we assume that
go run
runs the latest version, then the relative speedup from caching the binary is substantially reduced.)
Ok, so if we drop the requirement of using the latest version, which I agree is a dubious choice anyway, and if we assume that for most actual uses of the gogio
tool the user is operating inside a module, can go run
be made fast? I think so:
The first go run
inside a module records the version of the tool in go.mod. So subsequent go run
s of the same tool use the recorded version, which means that go run
can immediately use a cached version of the tool binary.
I think this is interesting because as I argue in #33518 (comment), go run
seems to be the correct choice for running gogio
, not go install
.
FWIW, the points described in this issue and #33518 were the main reasons behind creating https://github.com/myitcv/gobin.
Also linking #30515 therefore, specifically this comment: #30515 (comment). The most recent discussion with @ianthehat on a golang-tools call was that "something like gobin
" makes sense as a first cut (noting that whatever the solution it needs to be part of the Go distribution else we move the problem to installing another tool)
Hey folks, hopefully this is relevant to the discussion. I've been following the linked issues and discussions trying to find an answer to:
- What's the recommended way to recompile a Go a program during development?
Up until today, I thought that go run main.go
was meant for development and go build
was meant for production. I didn't realize that go run was meant for sporadic usage.
Could someone share the recommend way to recompile a Go program today? Maybe it's something like this?
go build && ./main
Should I be installing packages?
go build -i && ./main
What does it mean to cache binaries when your source code is changing?
Whether you use go build
or go run
, each compiled package will be stored in the build cache. So the only difference is whether the final binary is relinked when there are no changes.
go build
checks whether the output binary exists and examines metadata stamped into the binary. If the binary is up to date, go build
skips linking.
go run
writes the binary to a temporary file, then deletes it when it's done. So there's no opportunity to skip linking and reuse it.
The -i
flag is mostly obsolete. It copies compiled packages into $GOPATH/pkg
. This used to speed up builds before the build cache was introduced in Go 1.10, but now it's only useful for installing packages (not usually needed anymore).
Thanks for the overview @jayconrod! One question:
go build checks whether the output binary exists and examines metadata stamped into the binary. If the binary is up to date, go build skips linking.
Does this mean whenever there is a change to any package within your Go project, then go run
and go build
have the same performance?
Or is there some sort of up-to-date check with go build
on the package-level?
Does this mean whenever there is a change to any package within your Go project, then go run and go build have the same performance?
Yes, it should be nearly identical. If they're building the same packages in the same configuration, they should have the same hits and misses in the build cache.
Or is there some sort of up-to-date check with go build on the package-level?
Each package and binary is stamped with a build id, which is used to check whether it's up-to-date. That's used by all build commands, not just go build
. buildid.go explains how it works if you're curious.