golang/go

cmd/go/internal/get: VCS path regexp omits characters accepted by CheckImportPath

stub42 opened this issue · 15 comments

What version of Go are you using (go version)?

$ go version
go version go1.12.2 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/stub/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/stub/go"
GOPROXY=""
GORACE=""
GOROOT="/snap/go/3540"
GOTMPDIR=""
GOTOOLDIR="/snap/go/3540/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build922868437=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Attempted to fetch the git repo containing a go package, publicly available at:

Per go help importpath, the following should attempt a git download using https://, falling back to git+ssh://

go get -v git.launchpad.net/~stub/+git/go-eggs.git

Attempting to force ssh per git help environment does not help

GIT_ALLOW_PROTOCOL=ssh go get -v git.launchpad.net/~stub/+git/go-eggs.git

What did you expect to see?

go get attempt to clone the git repo at https://git.launchpad.net/~stub/+git/go-eggs, and if that failed, fall back to cloning the git repo at git+ssh://git.launchpad.net/~stub/+git/go-eggs.

Per go help importpath, because the import path has a version control qualifier (.git) then no attempt should be made to attempt to get the import over https: to look for a tag.

What did you see instead?

Go get attempts to parse meta tags from https://, which fails. No attempt is made to clone the git repo.

$ go get -v git.launchpad.net/~stub/+git/go-eggs.git
Fetching https://git.launchpad.net/~stub/+git/go-eggs.git?go-get=1
Parsing meta tags from https://git.launchpad.net/~stub/+git/go-eggs.git?go-get=1 (status code 200)
package git.launchpad.net/~stub/+git/go-eggs.git: unrecognized import path "git.launchpad.net/~stub/+git/go-eggs.git" (parse https://git.launchpad.net/~stub/+git/go-eggs.git?go-get=1: no go-import meta tags ())

Duplicate of #26134

The regular expression that looks for the .git suffix is here:

re: `^(?P<root>(?P<repo>([a-z0-9.\-]+\.)+[a-z0-9.\-]+(:[0-9]+)?(/~?[A-Za-z0-9_.\-]+)+?)\.(?P<vcs>bzr|fossil|git|hg|svn))(/~?[A-Za-z0-9_.\-]+)*$`,

Testing against that (playground) reveals that the problem here is that the + character fails to match the regular expression, which is more restrictive about + and ~ than our validation for import paths in general:

// pathOK reports whether r can appear in an import path element.
//
// NOTE: This function DIVERGES from module mode pathOK by accepting Unicode letters.
func pathOK(r rune) bool {
if r < utf8.RuneSelf {
return r == '+' || r == '-' || r == '.' || r == '_' || r == '~' ||
'0' <= r && r <= '9' ||
'A' <= r && r <= 'Z' ||
'a' <= r && r <= 'z'
}
return unicode.IsLetter(r)
}

This needs a fix, but it's not obvious to me whether the VCS path should be made more permissive, or CheckImportPath should be made more restrictive. I'll follow up with @rsc and @jayconrod.

rsc commented

I'm confused by whether the +git is essential here. The URL already starts and ends with git. Must it also have git in the middle? Somehow we have gotten this far without allowing + in go get paths, and I want to understand why.

When Launchpad code hosting added git support it was necessary to add +git as a path component to avoid namespace clashes with Bazaar branches. Launchpad code hosting is pretty much stuck with it, but yes, + will be very uncommon, unlike other special characters such as ~. The trailing .git is optional, and only in my example because I wanted to skip the https: VCS detection (probably necessary for private repositories, when I get that far).

I checked the new module index for other existing packages that use the + character, but haven't been able to locate any so far.

(That's not to say that they don't exist, but if they do they're rare.)

Perhaps we should disallow + from import paths entirely?

rsc commented

If this is not a breakage introduced after Go 1.12, I'm inclined to leave deciding what to do until after Go 1.13 is out.

@bcmills This issue is just rolling forward through release milestones. Should we just move it to the Backlog milestone? That's what Backlog is for. Thanks.

rsc commented

Unicode characters is #29101. Assuming we exclude that from this discussion, this discussion is only about whether to start using + in paths.

Technically the looser module restrictions mean someone could set up a non-VCS-backed module with + today, but all the VCS-backed stuff can't use it. There's nothing with + in the module index (index.golang.org).

rsc commented

It seems like we should reject + and wait for complaints.
There are no uses we can find.
It doesn't seem to be used anywhere.
If we drop it from pathOK on day 1 of Go 1.16, we will have time to put it back.

The original complaint is that Launchpad git hosting URLs such as https://git.launchpad.net/~stub/+git/go-eggs do not work with 'go get', and now Go modules. It is not used anywhere, because it doesn't work (and never has).

This issue is currently labeled as early-in-cycle for Go 1.16.
That time is now, so this is a friendly ping so the issue is looked at again.

Change https://golang.org/cl/250919 mentions this issue: module: reject "+" in CheckPath and CheckImportPath

This has affected me, but it's not a significant issue. I have a module containing a main package with the name etld+1, which is a tool for processing domain names. I can rename it to something less helpful, but it would be nice if I didn't have to.

Change https://golang.org/cl/282512 mentions this issue: content/static/doc: remove "+" from allowed module path characters