golang/go

cmd/go: go mod -vendor prunes non-package directories

ainar-g opened this issue Β· 57 comments

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

go version devel +8a33045 Fri Jul 13 03:53:00 2018 +0000 linux/amd64

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOCACHE="/home/ainar/.cache/go-build"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/ainar/go"
GOPROXY=""
GORACE=""
GOROOT="/home/ainar/go/gotip"
GOTMPDIR=""
GOTOOLDIR="/home/ainar/go/gotip/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build865981061=/tmp/go-build -gno-record-gcc-switches"

What did you do?

#!/bin/sh
set -e
export GO111MODULE=on
mkdir foo
cd foo
cat <<EOF > main.go
package foo // import "foo"

import "github.com/gen2brain/aac-go/aacenc"

func f() { _ = aacenc.VoPidAacMdoule }
EOF
$GO mod -init
$GO mod -vendor
$GO build -getmode vendor

What did you expect to see?

Successful build.

What did you see instead?

# github.com/gen2brain/aac-go/aacenc
vendor/github.com/gen2brain/aac-go/aacenc/aacenc.go:4:19: fatal error: voAAC.h: No such file or directory
 //#include "voAAC.h"
                   ^
compilation terminated.

The problem is that go mod -vendor pruned the external/aacenc folder, which contains the C code needed to build this package.

The cause here seems to be that we prune out directories that are not Go packages.

There is no voAAC.h in github.com/gen2brain/aac-go/aacenc/, but I do see one in github.com/gen2brain/aac-go/aacenc/external/aacenc/include/.

Side note from govendor. By default govendor only picks up pkgs referenced, but I had to add an option to grab the whole sub tree for this very reason.

Mostly cgo deps, but sometimes other required resources.

...

Right now I'm considering making a tool that can be run after go mod -vendor to trim unused packages. But I can't leave what isn't there.

I wonder whether in the world of modules what some/most people are actually intending to do in this situation (namely where they think they want/need to go mod -vendor) is snapshot the current module dependencies as opposed to the packages. Doing so would seem to address all of these points?

So then a go mod -modvendor (command/flags to be 🚲-shed-ed) would create a directory (say ./modvendor) that could then be used by GOPROXY=file:///path/to/modvendor.

That said, I don't use go mod -vendor and likely never will. But I do maintain an append-only cache of the modules that my CI then uses via GOPROXY=file://...

I don't think anyone actually wants to vendor everything. There are two problems here, that I think should not be mixed:

  1. Non-Go code dependencies vendoring. This is what this issue is about. Whether we like it or not, Go has CGo. And C dependencies are hard. Some approaches to solve this problem (excluding the painful choice of teaching the go tool about C dependencies... the horror):

    • Let module authors declare their non-Go code dependencies in their go.mod. Something like

      external "github.com/gen2brain/aac-go/aacenc/external/aacenc"
      
    • Define and document a convention for such dependencies (e.g. external directory, similar how we have conventions for internal and testdata).

    • Explicitly give up and document that CGo is fundamentally incompatible with go mod -vendor. After all, even if you do have the C code in your vendor, there are no guarantees that that C code doesn't have some dependencies of its own, so hoping that /vendor will save you from build failures is rather naΓ―ve.

  2. Binary tool dependency vendoring. This may or may not be a problem Go modules want to solve. @rsc said it in #24051.

    It is not a goal to expand into "linux distribution" territory.

    But then this, too, should be documented explicitly. It would be a shame though, as AFAIK other systems, like Ruby's Bundler and JS's NPM support some version of it.

One way to solve the cgo problem is to require a dummy go file in the c folder that is then imported by another package that uses the c files with a "_" import.

I think this is only a problem because the c and h files live in a separate directory from the go files.

rsc commented

Sorry, but the model for go builds is that what's needed to build a package is in that directory, not subdirectories. The way gen2brain/aac-go is structured, modifying the C source code in the subdirectories will not trigger a rebuild. Instead your build will use stale cache entries. The C code needs to be moved up to the package directory to make it build properly, not to mention vendor properly.

If anyone wants to build a tool that adds to the vendor directory after go mod -vendor, you could do that by reading the modules.txt file and using commands like go list -f '{{.Dir}}' to find out where each package was copied from.

@rsc I see two opportunities for improvement here. Firstly, should we explicitly clarify in the cgo documentation that the go tool notices only changes in Go packages? Right now the wording doesn't mention other directories and local includes at all. Something like

<...> Any .h, .hh, .hpp, or .hxx files will not be compiled separately, but, if these header files are changed, the C and C++ files will be recompiled. Changes in files included from other directories will not trigger a rebuild. <...>

(Emphasis = added by me.)

Secondly, should such local includes be a vet warning? There is already -cgocall, why not -cgoinclude?

Change https://golang.org/cl/125297 mentions this issue: cmd/cgo: document that #including source files in subdirectories is a bad idea

this tool might be helpful to others: https://github.com/goware/modvendor β€” I wrote it to easily copy .c, .h, .s, .proto files from module pkgs into my local ./vendor. Feel free to improve :)

go mod vendor drops github.com/ethereum/go-ethereum/crypto/secp256k1/libsecp256k1 files πŸ˜”

I've just added a "proposal" under #27618 (not marked as a proposal because the modules stuff is experimental, so more a discussion issue than anything). Apologies for the modvendor name clash @pkieltyka, I just spotted that.

I can't believe the proposed solution to this issue is to write another tool to finish what go mod vendor starts. Either write a tool to replace it or fix it so it copies everything. What is the reason for only vendoring half of the repository?

@nomad-software Russ outlined a response to this in #26366 (comment).

If instead you want to "vendor" the module that will give you everything. That's the subject of #27618 and a proposed approach is outlined at https://github.com/myitcv/go-modules-by-example/blob/master/012_modvendor/README.md.

@nomad-software this issue here was initially reported when the original reporter was using cgo.

Are also hitting this with cgo, vs. are you hitting this without using cgo?

If you are hitting this with cgo, and if we set aside vendoring for a moment, do you think you might be hitting problems even just during non-vendor-based building/recompiling due to the behavior described here:

https://golang.org/cmd/cgo/

When the Go tool sees that one or more Go files use the special import "C", it will look for other non-Go files in the directory and compile them as part of the Go package.
...
Note that changes to files in other directories do not cause the package to be recompiled, so all non-Go source code for the package should be stored in the package directory, not in subdirectories.

(It is worth reading the full description at that link, but wanted to include at least a snippet for convenience)

go mod vendor drops github.com/ethereum/go-ethereum/crypto/secp256k1/libsecp256k1 files

@im-kulikov I've written a tool to fully copy the entire directory of all dependencies into the vendor folder. You can find it here:

https://github.com/nomad-software/vend

Any suggestions to improve it are welcome, just open an issue there.

https://github.com/nomad-software/vend

Yo @nomad-software this should be for of Dep tool itself, awesome work! solved my issue brilliantly πŸ‘

Cheers!

@nomad-software I wanted to briefly double-check on the approach for https://github.com/nomad-software/vend. It seems to be intended for use with CGO (if I am not mis-understanding the README there) including when C/C++ headers and files are outside of directories with .go files.

It seems that approach of having C/C++ headers and files outside of directories with .go files might be dangerous for build issues? (Forgetting about vendoring for a second; just wanted to ask about building).

Comments above from Russ seemed to indicate:

The C code needs to be moved up to the package directory to make it build properly, not to mention vendor properly.

And (with emphasis added):

modifying the C source code in the subdirectories will not trigger a rebuild. Instead your build will use stale cache entries.

Also, the CGO documentation now includes (also with emphasis added):

Note that changes to files in other directories do not cause the package
to be recompiled, so all non-Go source code for the package should be
stored in the package directory
, not in subdirectories.

Sorry if I am mis-understanding...

@thepudds It depends entirely on your build process. The goal of vend was just to copy everything to the vendor folder, just in case the remote repository goes offline. It's up to you to manage any stale CGO cache entries, etc.

go mod vendor command can not copy c and other files to the vendor directory, this makes a lot of packages have problems, maybe can add go mod vendor -a.

I just wonder if all these vendoring problems will disappear one day... Don't get me wrong, I like the vendoring concept way more than downloading dependencies over and over. Offline builds are great. I am just sick of changing tools with even worse tools than before. We had a bunch of tools in a pretty short period of time: Godeps, govendor, dep, go mod. When it will all end?!

We also run into problems with this as we have non-Go files in a repository that contains e.g. SQL scripts (for database migrations) or config files (for testing). Before trying to switch to go modules we used vendoring via git submodules, which worked well and allowed us to produce deterministic builds. Now we would like to switch to go modules (mainly to be able to easily pin versions of external dependencies as well). We tried to use go mod vendor to install the selected versions of our dependencies into the vendor folder and then use the test config from there, but since it only copies Go files all the config files are missing.

It would be great if there was a flag that one could set to copy the entire repository to the vendor folder. This should not be difficult to implement as go mod already stores the entire repositories in $GOPATH/pkg/mod. The alternative would be to manually check out the repository and initialize the required config files, which is cumbersome as we'd have to find a way to fetch the commit corresponding to the one referenced in the go.mod file to obtain the files belonging to the specific commit.

Software doesn't only consist of source code, config files are just as important and it should be possible to manage the two using the same tools whenever possible.

@adewes, why is it important that the SQL scripts and config files be in the vendor directory in particular? I don't know of any reason you couldn't combine Git submodules for those non-Go dependencies and Go modules for everything else.

Well the SQL scripts belong to a specific version of the Go code (they define the database tables for the models defined in Go), so they should be versioned together. Of course we could use submodules for that but this would introduce another type of versioning (go module version and git submodule version), so I really don't like it. The files don't need to be in the vendor directory but the go mod vendor would provide a convenient and predictable way to check out the repositories belonging to the Go code that's used (and before learning that only Go files are checked out I thought this could solve our problem). Most other "modern" languages (e.g. Python, Javascript, C#) support bundling non-code files when building packages, so I really don't understand why Go shouldn't.

How are package maintainers and package providers supposed to deal with config or resource files when using the new module mechanism? I couldn't find anything in the documentation. If I want to publish my SQL scripts together with my Go code how would I do it such that people can use go get to fetch my package and my package knows where it should look for config files?

@bcmills how about this case, I have a lib with a few packages but no root go file. The mod won't bring anything from the root, that includes the README, LICENSE and so on. Maybe I want to read the README without having to open my browser to go to the lib page on git, or check the lib license.

@adewes, the key thing to understand is that go mod vendor operates on packages, not on modules. Any files that are colocated with a package (transitively) imported by the main module will be included in go mod vendor.

If you want to unpack entire modules instead, try go mod download -json.

@bcmills thanks for that info, I understand that go mod vendor checks out packages, but (I think) that's what we need, as we want to check out the files belonging to a package with a specific version used by the module. So a -a flag that would allow us to check out all files (not only Go files) would be really great. Using go mod download -json seems quite cumbersome again, as we would need to parse out the exact file locations from the JSON, which I think is unnecessary: Go already downloads a specific version of a package, so why not provide the files of that package in an easily predictable path (so we could e.g. fetch config files from vendor/<package name>/.../config)? That would make working with non-code files so much easier.

Right now we solved the problem by merging our two dependent packages together (so that we don't need to go get the second one), which works but is not a viable strategy for third-party dependencies.

I don't see anyone mentioning text files intended for use with the template package.

How will that work?

@coolaj86 - see #26366 (comment); Russ' response about C files applies to template txt files too.

Another (in my view preferable) alternative to this is to use something like https://github.com/jteeuwen/go-bindata in order to make the contents of those text files effectively a first class dependency citizen.

Also, just to note that this issue is closed so I only saw this by chance; others will likely not see this exchange. Please create a new issue if you think something has been missed in this thread.

Russ' response about C files applies to template txt files too

Sorry, I replied too hastily there. Russ' comment about C files does not apply to other non-Go or C files. The second part of my response (using go-bindata or similar) should however be correct and considered the right way to fix this.

Sorry, but the model for go builds is that what's needed to build a package is in that directory, not subdirectories. The way gen2brain/aac-go is structured, modifying the C source code in the subdirectories will not trigger a rebuild. Instead your build will use stale cache entries. The C code needs to be moved up to the package directory to make it build properly, not to mention vendor properly.

If anyone wants to build a tool that adds to the vendor directory after go mod -vendor, you could do that by reading the modules.txt file and using commands like go list -f '{{.Dir}}' to find out where each package was copied from.

following @rsc's hint, I managed to make the following "hack" in my Makefile:

versionPath=$(shell go list -f {{.Dir}} github.com/gogo/protobuf/version)
gogoprotos=$(versionPath:version=protobuf)
gogoprotosTarget:=./vendor/github.com/gogo/protobuf/

.PHONY: vendor
vendor:
	@go mod tidy -v
	@go mod vendor -v
	@go mod verify
	# work around to save proto files in gogoproto
	@mkdir -p $(gogoprotosTarget)
	@cp -rf --no-preserve=mode $(gogoprotos) $(gogoprotosTarget)

Sorry, but the model for go builds is that what's needed to build a package is in that directory, not subdirectories.

Cute idea. I challenge you to use Tor inside a Go program without pre-building it and dumping the shared library inside pkg for Go to pick it up. This would be a nice example to what devs out in the wild might actually need.

I found this issue after strugging with go-goracle/goracle#155.

@bcmills the use case is github.com/go-goracle/goracle needs the odpi directory, which contains cgo header and implementation files, but it's not referenced as a package outside of the cgo comments requiring those header files. I'm not going to use git submodules + go modules, because then I have to keep multiple versions of things in sync, or use a custom build script, which I'm not interested in creating or maintaining.

The frustration on my end is that I feel like I'm being punished for using go mod -vendor. How am I supposed to keep my dependency's dependency in sync with my project when there's no clear, simple, concise way for that to happen, or even understand that it was missing in the first place?

boreq commented

The same problem exists if one tries to use Go modules for pulling in a dependency which specifies its interface as a set of *.proto files. There is no way to use those in your project as they will not be included in the vendor directory. Right now the solution seems to go back to writing bash scripts.

@boreq, the Go source files generated from .proto files are supposed to be committed to version control.

Per https://golang.org/s/go1.4-generate:

once things are settled, the author commits the generated files to the source repository, so that they are available to clients that use go get

boreq commented

@bcmills
Compiling .proto files can depend on .proto files from other sources though.

Comments above from Russ seemed to indicate:

The C code needs to be moved up to the package directory to make it build properly, not to mention vendor properly.

And (with emphasis added):

modifying the C source code in the subdirectories will not trigger a rebuild. Instead your build will use stale cache entries.

Also, the CGO documentation now includes (also with emphasis added):

Note that changes to files in other directories do not cause the package
to be recompiled, so all non-Go source code for the package should be
stored in the package directory
, not in subdirectories.

This is notably in conflict with how TensorFlow includes C headers from its Go bindings: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/go/attrs.go#L20

Since the C headers are shared, the only way to put them in the Go package directory would be to duplicate the header files, which seems like a poor solution. Is there a better way? I'm happy to fork TensorFlow to make it vendor properly if necessary.

@mxplusb @fedepaol and anyone else coming here from the future:
I've solved this problem for goracle / godror by adding dummy golang files to the directories containing external C source code.
Then to ensure that go mod tidy doesn't remove them, you require each of those packages from the root package.
Lastly, you put build tags on the dummy golang files so that you don't encounter the error C source files not allowed when not using cgo

You can see it done here: https://github.com/godror/godror/pull/38/files

@veqryn thanks! I do believe it still doesn't address the root of the problem but it does help. :)

I can't believe the proposed solution to this issue is to write another tool to finish what go mod vendor starts. Either write a tool to replace it or fix it so it copies everything. What is the reason for only vendoring half of the repository?

any news here ? this issue is blocking ...

This issue is now two and a half years old and it affects a lot of people. Has there been any movement whatsoever on this problem?

This issue is closed, and has been for almost 2.5 years. People don't generally follow closed issues, so please open a new issue if you feel there is more information available here. Thanks

The issue the way it is currently seems to be very well explained and valid. @rsc closed it back then asking for someone from the community to fix it ? but its quite bizarre, because go mod was supposed to fix dep, but this feature actually works well on dep, but there is no clear way to make it work with go mod. I dont understand what is the point to make everyone move to go mod, when it is actually not feature complete.
For now, I managed to solve it with the work around of using (https://github.com/nomad-software/vend) but in reality, this is something that should be fixed here.

Wow, two years have passed and still no proper vendoring with the go mod. At this point, I think it's a political decision from Google. People want full repo dependencies copy into the vendor directory, I think it's trivial, yet we still don't have it.

lol this is a 2.5 year old issue. why does everything have to suck?

this still an issue, its crazy how we still need to use vend to solve the package issue....

We use go mod vendor with a CGo module that references .C and .H files stored in its subdirectory, and they are not copied by the tool.

Perhaps, tool could support an option in go.mod file to list modules that should be fully copied, instead of stripped?

@hrissan Partially re-using an older comment:

If you are hitting this with cgo, and if we set aside vendoring for a moment, there is a decent chance you are also having a more fundamental problem even just during non-vendor-based building/recompiling due to the behavior described here:

https://golang.org/cmd/cgo/

When the Go tool sees that one or more Go files use the special import "C", it will look for other non-Go files in the directory and compile them as part of the Go package.
...
Note that changes to files in other directories do not cause the package to be recompiled, so all non-Go source code for the package should be stored in the package directory, not in subdirectories.

Does that seem like it applies to you?

We have a legal requirement to add the license file of some packages to their vendor directory. Unfortunately just copying go files and not having an option for other files can become a serious issue.

Any chance we can restart this discussion? A flag to declare assets for a package would make sense? I guess something similar happens when using the embed annotation.

Looking #26366 (comment), it seems that using a tree of C files will not work correctly.

But I still have to use code from a library that is organized as a tree. So I need to flatten that tree as files in my package directory.

Has anyone built a tool that will take a C file and resolve and inline all #include that are relative to a base directory? This is a subset of what a preprocessor does...

Here is a workaround that may be useful for other projects:

For package github.com/dolmen-go/hid (which embeds C libraries libusb and hidapi) I've implemented an ad-hoc C preprocessor (triggered by go generate) that concatenates all required C files from the subdirectories and produces 3 .h files (one for each supported platform). The cgo source just includes the the local .h file corresponding to GOOS.

All,

I'm a packager on Gentoo, and I'm hitting this with https://git.sr.ht/~rjarry/aerc.git.

I am required to vendor to build packages under our package manager, and it doesn't look like that is going to change any time soon. Our package manager team sees allowing me to reach out to the network during a build as a massive security hole they don't want to open, and they tell me that having builds that are required to reach out to the network breaks building on unconnected systems.

So, what is the best way to deal with this issue?

Thanks,

William

I have packaged the vend tool from @nomad-software on Gentoo, thanks for writing it.

IMO this is a pretty significant issue since go mod vendor creates vendor directories that are unusable in some situations.

We will use vend for now, but please count me in as someone requesting that this discussion be restarted.

Thanks much for your time,

William

Why is this closed? This is still a major issue. Go dep was able to handle this with "unused-packages = false" which means many cannot and will not switch over to go mod.

Why is this closed?

See #26366 (comment).