golang/go

cmd/go: standard-library vendoring in module mode

bcmills opened this issue Β· 53 comments

This proposal is both a fix for #26924, and a means for users to explicitly upgrade the golang.org/x dependencies of the standard library (such as x/crypto and x/net).

Proposal

The standard library will ship with three go.mod files: src/go.mod (with module path std), src/cmd/go.mod (with module path cmd), and misc/go.mod (with module path misc).

The std and cmd modules will have vendor directories, containing modules.txt files in the usual format and managed using the standard Go 1.13 vendoring semantics (hopefully #30240).

The std and cmd modules differ somewhat from ordinary modules, in that they do not appear in the build list.

  • No module outside GOROOT/src may declare its own module path to begin with std or cmd.
  • No module may explicitly require, replace, or exclude any module whose path begins with std or cmd. (Their versions are fixed to the Go release in use.)

For files within GOROOT/src only, when we resolve an import of a package provided by an external module, we will first check the build list for that module.

  • If the build list does not contain that module, or if the version in the build list is strictly older than the one required by the standard-library module containing the file, then we will resolve the import to a directory in the standard library, and the effective import path will include the explicit prefix vendor or cmd/vendor.
    • The explicit prefix ensures that no two packages in the build will have the same import path but different source code.
    • It also ensures that any code loaded from a valid module path has a corresponding module in the build list.
  • If the version in the build list is equal to than the one required by std or cmd, that version is not replaced in the main module, and the package exists in the corresponding vendor directory, then we will load the package from vendor or cmd/vendor with its original import path.
    • This eliminates network lookups if the main module is std or cmd itself (that is, if the user is working within GOROOT/src).
    • We assume that the code in GOROOT/src/vendor and GOROOT/src/cmd/vendor will remain pristine and unmodified.
  • If the version in the build list is greater than the version required by std or cmd, or if the version is equal and the module is replaced, then we will load the package from the module cache in the usual way.
    • This may cause substantial portions of the library to be rebuilt, but will also reduce code duplication β€” which should result in smaller binaries.

I am strongly in favor of using modules for vendoring in the standard library, instead of ad-hoc copies, but I don't see the reason to let applications override standard library dependencies. Was this widely requested, or something we wished we had often?

Speaking for x/crypto, I actually don't want to have to figure out if a crypto/tls internal failure is due to a replaced chacha20poly1305.

Also, I don't see why we'd let users override vendored dependencies but not standard library packages. The difference between golang.org/x/crypto/chacha20poly1305 and crypto/aes should be an implementation detail of crypto/tls. Then why can one be overridden, and the other not? Feels arbitrary.

cc @ianthehat

I share some of @FiloSottile's concerns. Allowing users to override std dependencies, especially crypto dependencies, is scary. But not allowing this would be against the spirit of modules.

There are some safeguards here, but I'd love to solve this without making the module resolution and package loading logic more complicated.

Possibly dumb, low-tech idea: what if we vendor these modules without using vendoring? We could run go mod vendor, then rename the vendor directory to something else (internal/vendor_) and fix up the imports? That would fix #26924 and move complexity out of cmd/go, into tools.

what if we vendor these modules without using vendoring? We could run go mod vendor, then rename the vendor directory to something else (internal/vendor_) and fix up the imports?

We have that today, in internal/x. We could fix #26924 using the same strategy in cmd, if we're ok with code duplication for the x repos. (The goal of this proposal is to reduce that duplication.)

@FiloSottile

I don't see the reason to let applications override standard library dependencies. Was this widely requested, or something we wished we had often?

It has come up at Go team summits from time to time, and while we don't see many complaints about upgrading x repos explicitly, we do see fairly frequent complaints about binary sizes (#6853 is canonical).

The two are closely related, in that one way you can end up with an overly-large binary is by pulling in a redundant copy of an x repo from the standard library.

Per my favorite quote from C. A. R. Hoare's Hints on Programming Language Design:

[L]isten carefully to what language users say they want, until you
have an understanding of what they really want. Then find some way of
achieving the latter at a small fraction of the cost of the former. This is
the test of success in language design, and of progress in programming
methodology.

It's much easier for us to reduce code duplication in the standard library than to reduce the outputs generated from duplicated code.

Speaking for x/crypto, I actually don't want to have to figure out if a crypto/tls internal failure is due to a replaced chacha20poly1305.

We probably should start requesting the output of go list -m all in the issue template, for precisely that reason. (The same problem can occur when someone reports an issue against x/net, and that has ~nothing to do with the standard library.)

But consider the opposite direction, too: wouldn't it be nice for folks to be able to, say, test the interaction of TLS 1.3 with their dependencies written against 'net/http`, without having to also run a bleeding-edge version of the compiler and runtime (with any associated runtime bugs)?

Also, I don't see why we'd let users override vendored dependencies but not standard library packages. The difference between golang.org/x/crypto/chacha20poly1305 and crypto/aes should be an implementation detail of crypto/tls. Then why can one be overridden, and the other not? Feels arbitrary.

That seems like a great argument for making many of the standard-library packages thin forwarding shims around x/ packages. I may file that proposal, in fact, but it's separate from this one. πŸ™‚

I'm definitely sympathetic to reducing code duplication and binary size, but I'm also concerned about complexity and special cases.

I think we should resolve #26924 using the internal/x approach in order to unblock other module work for 1.13, and we should talk further about modularizing the standard library after 1.13. Just to confirm my understanding, converting cmd/vendor to cmd/internal/x would not significantly increase binary or SDK download size, right?

@bcmills @FiloSottile

We probably should start requesting the output of go list -m all in the issue template, for precisely that reason. (The same problem can occur when someone reports an issue against x/net, and that has ~nothing to do with the standard library.)

This is the same reaction I had when I saw the objection over updating the std lib versions. Make it easy and known how to list module versions and request it in issues. I also remember when Azure had to vendor a patched version of crypto/tls to access to hit their TLS endpoints in their SDK, then fork the world to make it reference the fork.

@jayconrod, in looking at the details of this, it seems to be by far the simplest way to make module-mode vendoring work for vendoring in the standard library.

go mod vendor uses the import graph to figure out which packages to copy, so it's much simpler to have the module-provided packages in the import graph than to try to build some other analysis (and vendoring tool) specifically for the standard library.

I don't disagree with including go list -m all in issue reports, but it still does not make me feel good about letting users swap chacha20poly1305 backends to crypto/tls. TLS already has too many joints, and joints in cryptography are where things break. I'd rather not to have to keep in my head an extra dimension of the compatibility matrix.

@bcmills I agree that this is probably the simplest way to make module-mode vendoring work in the standard library in terms of new code that needs to be written. My concern is that this makes package loading harder to reason about. In particular, this proposal means that you may get either one or two copies of some packages, depending on what version the main module requires or whether it has a replacement. We have the same risk of conflict with exported types or global state as we do today, but now the risk is more subtle.

Here's a possible alternative (with some drawbacks at the end):

  • The Go SDK ships with go.mod files for std, cmd, and misc, as described above.
    • Imports starting with std/, cmd/, or misc/ are resolved to those modules. This will require new logic, but it would replace existing GOROOT / GOPATH package resolution.
    • These modules may have normal requirements on x/tools, x/net, etc.
  • The Go SDK ships with a pkg/mod directory, which would contain dependencies of the standard modules. It would be in the normal module cache format. It would act as a fallback read-only cache before going to the network.
  • For packages that should not be user-replaceable (i.e., chacha20poly1305), we would create private copies in internal/x. I'm hoping there wouldn't be many of these, so we could still mostly avoid code duplication in binaries.
    • I think writing a tool to do this would be fairly straightforward. Maybe this is something gorelease should do.

I'm realizing some drawbacks to this approach though, as I'm typing this out.

  • This would increase SDK size, since pkg/mod would include sources for packages we don't depend on. Not sure how much this would add.
  • Our dependencies on x/tools, x/net would effectively set global minimum versions for those modules. MVS would override user requirements on older versions.
  • Our dependencies on x/tools, x/net would effectively set global minimum versions for those modules.

Not only that, but all of their transitive dependencies too: for example, various parts of x/net and std currently depend on x/text, and x/text has a dependency on x/tools via golang.org/x/text/cmd/gotext and golang.org/x/text/message/pipeline. x/tools has a fairly broad dependency graph (see #29981), so we would be setting minimums for a lot of arbitrary modules.

Note that everything in cmd should be a package main, so nothing can import it. That makes the cmd module a bit easier to handle than std, since it mitigates the code bloat from using a vendor directory: we know affirmatively that no binary that links against cmd will also link against a redundant x repo.

misc is similar to cmd: it does not have a globally-addressable import path, so the only time you can build anything in misc is if it's the main module.

@FiloSottile

TLS already has too many joints, and joints in cryptography are where things break. I'd rather not to have to keep in my head an extra dimension of the compatibility matrix.

I can certainly sympathize with that, but in that case, is there a fundamental reason for the implementation of crypto/tls to live in the standard library at all?

That is: if it's that tightly coupled to x/crypto, why not make it a forwarding faΓ§ade over an implementation in x/crypto? That would preserve the advantages of decoupling the release cycle and reducing code bloat, while keeping the compatibility dimensions roughly equivalent to x/crypto today.

Change https://golang.org/cl/162989 mentions this issue: go/analysis: allow overriding V flag without code patches

To summarize so far: it seems that there are some concerns about the module-upgrade behavior, but no objections to the remainder of the proposal.

Since a lot of this work needs to happen early in the cycle, I'm going to go ahead and implement a more limited form: for now, we'll always resolve dependencies of the standard library to the packages in the vendor directories, and we'll only treat them as part of the external module if the working directory is within the std or cmd module proper.

TLS already has too many joints, and joints in cryptography are where things break. I'd rather not to have to keep in my head an extra dimension of the compatibility matrix.

I can certainly sympathize with that, but in that case, is there a fundamental reason for the implementation of crypto/tls to live in the standard library at all?

That is: if it's that tightly coupled to x/crypto, why not make it a forwarding faΓ§ade over an implementation in x/crypto? That would preserve the advantages of decoupling the release cycle and reducing code bloat, while keeping the compatibility dimensions roughly equivalent to x/crypto today.

Making the implementation of crypto/tls swappable is interesting, and I had explored a different approach in #21753. However, I'm not sure we want to move something so core to most applications to x/crypto. For example, the security release process of x/crypto is far, far weaker.

Maybe we could still make main security releases which just bump the x/crypto dependency, but there's a lot of moving parts to this all, and I'd like to see it as a separate proposal.

(Also, I'm not convinced by the argument that a x/crypto dependency that I don't want to be a joint makes crypto/tls substantially different from any other standard library package.)

I'm not convinced by the argument that a x/crypto dependency that I don't want to be a joint makes crypto/tls substantially different from any other standard library package.

Agreed, but I think the point generalizes: any package in std that is tightly coupled to one in x could be replaced with a forwarding shim that isolates the coupling within the module.

I have two questions about this proposal.

The standard library will ship with three go.mod files: src/go.mod (with module path std), src/cmd/go.mod (with module path cmd), and misc/go.mod (with module path misc).

The standard library has a policy on which packages are allowed to import which others. It's codified in:

// pkgDeps defines the expected dependencies between packages in
// the Go source tree. It is a statement of policy.

Have you considered following the L0/L1/L2/L3/(L4+everything else) tiers there to guide std lib module boundaries (e.g., having stdl0, stdl1, stdl2, stdl3, stdl4 modules rather than a single std one)? Would there be benefits in doing so?

The std and cmd modules differ somewhat from ordinary modules, in that they do not appear in the build list.

What is the motivation to make these modules differ from ordinary modules?

Change https://golang.org/cl/163207 mentions this issue: misc: add go.mod file

Have you considered following the L0/L1/L2/L3/(L4+everything else) tiers there to guide std lib module boundaries (e.g., having stdl0, stdl1, stdl2, stdl3, stdl4 modules rather than a single std one)? Would there be benefits in doing so?

That's an interesting idea, but one of the nice properties of this proposal is that the module boundaries follow the same rules as for ordinary module boundaries: std encompasses the whole tree from src/go.mod up to cmd/go.mod. That allows us to use ordinary module-mode logic for things like vendoring when within that tree, with minimal tweaks to remove the std prefix.

What is the motivation to make these modules differ from ordinary modules?

std is tightly coupled to the compiler and runtime, so attempting to upgrade or replace it independent of the rest of the toolchain would be extremely error-prone.

Similarly, cmd is tightly coupled to the compiler: it's not at all clear what things like go tool vet should do if you've replaced cmd with something else.

Before proceeding with the vendoring changes, I would feel a lot better if @rsc and @ianthehat LGTM'd this proposal, and I want to make sure @FiloSottile is satisfied. What do you think about holding a feedback session?

Change https://golang.org/cl/163519 mentions this issue: cmd/go: allow "stdout" and "stderr" as inputs to script_test "cp" command

Change https://golang.org/cl/164623 mentions this issue: all: move internal/x to vendor/golang.org/x and revendor using 'go mod vendor'

Change https://golang.org/cl/164618 mentions this issue: cmd: refresh cmd/vendor to match 'go mod vendor'

Change https://golang.org/cl/164619 mentions this issue: go/build: change the search order for "vendor/" paths based on srcDir

Change https://golang.org/cl/164622 mentions this issue: cmd,std: add go.mod files

Change https://golang.org/cl/164625 mentions this issue: go/internal/srcimporter: set -mod=vendor before running tests

Change https://golang.org/cl/164626 mentions this issue: cmd/go: quote expanded shell variables used within regular expressions

Change https://golang.org/cl/165377 mentions this issue: cmd/go/internal/modload: do not fetch modules in searchPackages if -mod=vendor is set

Change https://golang.org/cl/165378 mentions this issue: all: add -mod=vendor to GOFLAGS in tests that execute 'go' commands within std or cmd

Change https://golang.org/cl/165497 mentions this issue: cmd/api: use 'go list' to locate transitive dependencies of std

Change https://golang.org/cl/165749 mentions this issue: cpu: define cacheLineSize and doinit for unknown CPUs

We plan to make the standard-library vendor directories work in 1.13, but will defer for a later release the decision on whether (or how) to allow user modules to upgrade them.

(That is: for 1.13, the std-vendored modules, viewed from outside std, will continue to be treated as packages in the standard library and not part of the corresponding external module.)

Change https://golang.org/cl/251159 mentions this issue: cmd/go/internal/modload: remove (*loader).forceStdVendor

Change https://golang.org/cl/254738 mentions this issue: cmd/api: omit outside dependencies when listing the packages in "std"

any progress for this proposal?

andig commented

We've recently had a case that touches @FiloSottile's comment. Working with a broken device containing borked certificate we needed to modify crptoybtes (https://groups.google.com/u/1/g/golang-nuts/c/wlhj5RFXh9g). Unfortunately it was a huge endeavour to apply this patch consistently throughout local, ci and docker build. We are aware that this patch is not suitable for upstream or any other purpose but it would have been convenient to have a simpler way of applying it.

Is there anything that could move this proposal into the active column of the proposals project?

rsc commented

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
β€” rsc for the proposal review group

rsc commented

It sounds like people are generally OK with this. There is some question about whether there are problems with x/crypto being too tied to the main repo, but if so we should find and fix those.

rsc commented

Are there any objections to making this change?

Over the past few years we actually brought a lot from x/crypto into std, so these days only chacha20poly1305, hkdf, and cryptobyte are vendored into the standard library. This makes upgrading x/crypto both less troublesome and less useful.

I find it a bit weird to expose upgrading these specific packages but not anything else in the stdlib to applications. Why can they upgrade ChaCha20Poly1305 but not AES-GCM? Why can a X.509 parsing bug be fixed with a replace if it's in cryptobyte but not if it's in the crypto/x509 caller? There is no answer because we never designed that boundary to be meaningful: vendoring a package was an implementation detail, not a visible decision. This change would expose it.

Looking at the full list of vendored packages, this doesn't feel arbitrary only for x/crypto.

# golang.org/x/crypto v0.0.0-20220722155217-630584e8d5aa
## explicit; go 1.17
golang.org/x/crypto/chacha20
golang.org/x/crypto/chacha20poly1305
golang.org/x/crypto/cryptobyte
golang.org/x/crypto/cryptobyte/asn1
golang.org/x/crypto/hkdf
golang.org/x/crypto/internal/poly1305
golang.org/x/crypto/internal/subtle
# golang.org/x/net v0.0.0-20220920203100-d0c6ba3f52d9
## explicit; go 1.17
golang.org/x/net/dns/dnsmessage
golang.org/x/net/http/httpguts
golang.org/x/net/http/httpproxy
golang.org/x/net/http2/hpack
golang.org/x/net/idna
golang.org/x/net/lif
golang.org/x/net/nettest
golang.org/x/net/route
# golang.org/x/sys v0.0.0-20220804214406-8e32c043e418
## explicit; go 1.17
golang.org/x/sys/cpu
# golang.org/x/text v0.3.8-0.20220722155301-d03b41800055
## explicit; go 1.17
golang.org/x/text/secure/bidirule
golang.org/x/text/transform
golang.org/x/text/unicode/bidi
golang.org/x/text/unicode/norm

In other words, it's nice if by luck a bug is fixable by replacing cryptobyte, but I think it would be more useful to talk about this in the broader context of a plan that involves carving out the runtime and compiler independent pieces of std that folks might want to upgrade, like the whole crypto/tls tree, net/http, etc. That would consistently relieve pressure on backport choices and make it easier to adapt to unsupported cases without forking the whole toolchain, which feels like the actual need here.

mvdan commented
  • If the version in the build list is equal to than the one required by std or cmd, that version is not replaced in the main module, and the package exists in the corresponding vendor directory, then we will load the package from vendor or cmd/vendor with its original import path.

Out of curiosity, shouldn't we be able to deduplicate code and get smaller binaries in this case as well? The benefits are only mentioned in the third case (build list has a newer version), but not in this second case (build list has the same version).

@FiloSottile, I agree that this proposal would be more useful if more of the standard library could be upgraded.

The process seems straightforward enough: for a given package in std, define a corresponding package in an x/ repo, and turn the std package into a thin forwarding shim around the x/ package. Then, upgrading the x/ package automatically upgrades the std package as well. We would just need a consensus as to which std libraries should become wrappers, and perhaps extend the main-repo API checks to also run for the relevant x/ packages.

rsc commented

To be clear, this change is not really being made just for x/crypto. It's a step toward a more rational relationship between std and x. We'll see what the next step is after we take this one.

rsc commented

Based on the discussion above, this proposal seems like a likely accept.
β€” rsc for the proposal review group

No module outside GOROOT/src may declare its own module path to begin with std or cmd

Searching for "module std" filename:go.mod on Github yields 339 results, for "module cmd" filename:go.mod there are 485 results.

If I understand correctly, those would break.

If we want to avoid breaking them, we could use for example golang.org/std and golang.org/cmd as the module names instead, as those do not seem to be used now.

std and cmd are already special names, as documented at https://pkg.go.dev/cmd/go#hdr-Package_lists_and_patterns. I'm a bit surprised that it works to use those as module names.

Looking at search result counts is inaccurate. Those are all copies of the Go source tree.

std and cmd are already special names, as documented at https://pkg.go.dev/cmd/go#hdr-Package_lists_and_patterns

Ah, missed that, thanks for the link! I was checking https://go.dev/ref/mod#module-path before I posted my comment and that page does not mention std nor cmd as special name.

Looking at search result counts is inaccurate. Those are all copies of the Go source tree.

Fair point. I haven't counted how many are copies of Go source tree. Could be that most of them are.

There are some examples that aren't copies of Go source tree:


To be clear - I'm fine with disallowing module names beginning with src and cmd, just wanted to point out that there are some existing modules like this in the wild.

rsc commented

No change in consensus, so accepted. πŸŽ‰
This issue now tracks the work of implementing the proposal.
β€” rsc for the proposal review group