NVIDIA/nvidia-container-runtime

invalid version: module contains a go.mod file, so major version must be compatible: should be v0 or v1, not v3

anthonyrisinger opened this issue · 9 comments

I'm trying to update our GPU nodes to the newer version, and I'm not sure if this is an issue here or something with the way I am requesting, but the newly-added go.mod to the root between v3.4.0 and v3.5.0 is now causing this for an otherwise working installation:

Step 26/59 : RUN go install github.com/nvidia/nvidia-container-runtime/...@v$RUNTIME_VERSION /gpu/nvc &&       mv /gpu/nvc/src /gpu/nvc/nvidia-container-runtime
 ---> Running in 70e8dc8e96f4
go get github.com/nvidia/nvidia-container-runtime/...@v3.5.0: github.com/nvidia/nvidia-container-runtime@v3.5.0: invalid version: module contains a go.mod file, so major version must be compatible: should be v0 or v1, not v3

I found https://golang.org/ref/mod and tried a few variations, but couldn't get it to work with any of them, all appear to parse properly and then fail with the same error:

If the module is released at major version 2 or higher, the module path must end with a major version suffix like /v2. This may or may not be part of the subdirectory name. For example, the module with path golang.org/x/repo/sub/v2 could be in the /sub or /sub/v2 subdirectory of the repository golang.org/x/repo

go install github.com/nvidia/nvidia-container-runtime/...@v3.5.0
go install github.com/nvidia/nvidia-container-runtime/v3@v3.5.0
go install github.com/nvidia/nvidia-container-runtime/cmd/nvidia-container-runtime/v3@v3.5.0

Any ideas?

Hi @anthonyrisinger

Although the package is go-installable, the versioning doesn't adhere to the go module versioning spec. This is also not currently our supported mechanism for distributing the runtime -- especially since the other components of the stack (the container toolkit and libnvidia-container) are not covered by this process.

With that said, you should be able to install off the SHA associated with the tag though.:

$ git describe --tags
v3.5.0
$  container-runtime git:(5731186) ✗ git rev-parse HEAD
573118615a100ef9d6fb1dc6aab53ea4dc06952a

This can be installed as follows:

docker run --rm -ti golang:1.16
go install github.com/nvidia/nvidia-container-runtime/cmd/nvidia-container-runtime@573118615a100ef9d6fb1dc6aab53ea4dc06952a

Note that I had to use golang 1.16 to get this to work as 1.15 would complain about the path and version combination not being supported (at least out the box).

Thanks! I can definitely fix it with a pseudo-version. I was mostly trying to make you aware that it won't work out of the box, so I won't be the last one to run into it likely. I don't know a ton about go's version handling; it seems like it's close to functioning properly but I can't figure out what it's looking for exactly.

I didn't put it in the original post, but for reasons, we aren't using go install directly. The thing I was calling is a tiny wrapper around go get that compiles things with a few flags and handles some past issues:

    # Complicated magic to allow proper modules-aware "go get".  If multiple
    # packages are supplied in one command, the dependencies in common will
    # only be downloaded once.
    mkdir go go-cache tmp dl
    cd dl
    for package in "${@:1:($#-1)}"; do
        echo "Building $package..."
        go mod init temporary-download
        # Leave -buildmode=pie enabled here; we don't care about the build times or size for third-party binaries,
        # and some of them have cgo which should be protected. The protection is potentially useful, but not enough
        # to make doubling tests times worth it.
        GOPATH="$tmpdir/go" GOCACHE="$tmpdir/go-cache" GOTMPDIR="$tmpdir/tmp" GO111MODULES=on CGO_ENABLED=1 \
            go get -trimpath -tags=netgo,osusergo -buildmode=pie -ldflags="-s -w" "$package"
        echo "Built $package."
        echo ""
        rm -f go.mod go.sum
    done

It was this thing failing originally but the go install failure is identical. When I tried to use your suggestion, I ended up in another strange place (while proofreading this, I discovered I had not changed ... to cmd/nvidia-container-runtime yet, but it fails in the same confusing way):

Building github.com/NVIDIA/nvidia-container-runtime/cmd/nvidia-container-runtime@573118615a100ef9d6fb1dc6aab53ea4dc06952a...
go: creating new go.mod: module temporary-download
go: downloading github.com/NVIDIA/nvidia-container-runtime v1.0.3
go: downloading github.com/NVIDIA/nvidia-container-runtime v0.0.0-20210429152431-573118615a10
go get: github.com/NVIDIA/nvidia-container-runtime@none updating to
	github.com/NVIDIA/nvidia-container-runtime@v0.0.0-20210429152431-573118615a10: parsing go.mod:
	module declares its path as: github.com/nvidia/nvidia-container-runtime
	        but was required as: github.com/NVIDIA/nvidia-container-runtime

That could have something to do with our wrapper, but we are only calling go get. I tried with the full path to the cmd/nvidia-container-runtime binary, and the triple dot ..., and both fail in the same way.

Addendum, once I changed my installation path to reference the lowercase nvidia (even though that's not what a person sees in the URL), things started working with both ... and cmd/nvidia-container-runtime.

It's really for the other repo, but since I already have something open here and they are so related I'll drop a note. The nvidia-container-toolkit repo has the exact opposite problem:

Building github.com/nvidia/nvidia-container-toolkit/...@v1.5.0...
go: creating new go.mod: module temporary-download
go: downloading github.com/nvidia/nvidia-container-toolkit v1.5.0
go get: github.com/nvidia/nvidia-container-toolkit@v1.0.5 updating to
	github.com/nvidia/nvidia-container-toolkit@v1.5.0: parsing go.mod:
	module declares its path as: github.com/NVIDIA/nvidia-container-toolkit
	        but was required as: github.com/nvidia/nvidia-container-toolkit

I managed to get it all working, with an added note in the Dockerfile that the NVIDIA URLs were case-sensitive and intentionally different.

@anthonyrisinger, thanks for the investigation and glad you could get it working. We will look at proper versioning of the modules going forward -- especially if we decide that go install is something that we want to officially support (it's still a test at the moment).

If we were to "correct" the module names to be consistent in the module files, this would break your Dockerfile again. Would you be ok with this -- assuming that we state it explicitly in the release notes?

One question: How do you generate the package names that you are go geting? Is it just a list of packages that are edited by hand, or are these programatically discovered?

Thank you! I definitely understand, go modules are tricky to say the least.

I could definitely deal with the small nvidia-to-NVIDIA breakage, especially if I know it's coming!

We have a few places that generate lists of module names while scanning for third-party go deps (priming build-time base-layer caches) and they use whatever is written in the scanned .go file, verbatim. The tool I was using is for direct use by developers in our Dockerfiles, so I copy/pasted the URL (NVIDIA) from browser to editor.

FWIW both seem to work already, it's just NVIDIA might be the "official one", per your org name?

https://github.com/NVIDIA/nvidia-container-runtime
https://github.com/nvidia/nvidia-container-runtime

@anthonyrisinger see https://gitlab.com/nvidia/container-toolkit/container-runtime/-/merge_requests/54 which reverts the commit that changed the case from NVIDIA --> nvidia.

Thanks for the note! I think this will work nicely, but I'll be sure to drop a note when I do the next round of updates, post-release.

elezar commented

@anthonyrisinger I'm closing this as a I assume that the changes fixed things on your end. If not, please open a new issue against https://github.com/NVIDIA/nvidia-container-toolkit.