constabulary/gb

gb vendor fetch: does not tolerate HTML-style unclosed "meta" tag

Closed this issue · 4 comments

seh commented

When attempting to run gb vendor fetch against a remote import path without a VCS type qualifier, the XML parser rejects a meta tag supplied by the server that go get accepts. Specifically, if the response document contains a meta tag that's an XML start tag, but not an empty element or paired with a closing end tag, gb vendor fetch rejects the document as being invalid. It's true; the document is invalid XML, because it consists solely of an opened-but-never-closed element.

The go get documentation offers this meta tag as an example:

<meta name="go-import" content="import-prefix vcs repo-root">

Taken literally, that's valid HTML, but not valid XML unless it's followed by an end tag (</meta>). However, go get accepts it, because go get's parser tolerates premature EOF so long as it already parsed at least one go-import directive, whereas gb vendor fetch does not.

Note that go get concession came in commit 5b085482ae8792cf751371991488d29e7bd3762f, in response to issue 13683 ("cmd/go: go get uses XML parser to parse HTML"). gb vendor fetch's current code predates that fix by about five months. Would you consider accepting a port of that go get fix to gb vendor?

Thank you for reporting this issue.

On Mon, 12 Sep 2016, 22:42 Steven E. Harris notifications@github.com
wrote:

When attempting to run gb vendor fetch against a remote import path without
a VCS type qualifier
https://golang.org/cmd/go/#hdr-Remote_import_paths,
the XML parser rejects a meta tag supplied by the server that go get
accepts. Specifically, if the response document contains a meta tag
that's an XML start tag, but not an empty element or paired with a closing
end tag, gb vendor fetch rejects the document as being invalid
https://github.com/constabulary/gb/blob/master/internal/vendor/discovery.go#L44.
It's true; the document is invalid XML, because it consists solely of
an opened-but-never-closed element.

The go get documentation offers this meta tag as an example:

Taken literally, that's valid HTML, but not valid XML unless it's followed
by an end tag (). However, go get accepts it, because go get's
parser tolerates premature EOF so long as it already parsed at least on
go-import directive
https://github.com/golang/go/blob/master/src/cmd/go/discovery.go#L46,
whereas gb vendor fetch does not
https://github.com/constabulary/gb/blob/master/internal/vendor/discovery.go#L44
.

Note that go get concession came in commit
5b085482ae8792cf751371991488d29e7bd3762f
golang/go@5b08548,
in response to issue 13683 ("cmd/go: go get uses XML parser to parse
HTML") golang/go#13683. gb vendor fetch's
current code predates that fix by about five months. Would you consider
accepting a port of that go get fix to gb vendor?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#642, or mute the thread
https://github.com/notifications/unsubscribe-auth/AAAcAyrnCttSMA_WrVrJl33LiLm527TVks5qpUeDgaJpZM4J6iXg
.

Does it fix the problem by picking up the fix from this new commit from Golang v1.7.1? golang/go@81a4bbf

Here's the Gerrit change: https://go-review.googlesource.com/#/c/14482/

seh commented

I prepared a fix, porting both the change that @shawnzhu mentions and the go get change I mentioned above (5b085482ae8792cf751371991488d29e7bd3762f), and wrote unit tests attempting to prove that both are necessary. I had a hard time finding any input that requires both changes.

When using an XML parser with Strict set to false, as both go get and gb vendor do, the parser tolerates unquoted attribute values, which is what I thought thought was the interesting part of this test case. Using Decoder.RawToken instead of Decoder.Token is enough to get that test to pass in gb vendor, because with RawToken the parsing never declares premature EOF for lack of an XML end element. It takes something like an unterminated CDATA section.

It's probably useful to accommodate such a malformed document; I'm just confused as to whether that test case in 5b085482ae8792cf751371991488d29e7bd3762f failed without that change.

I'll submit a PR for your consideration.

seh commented

Now that #643 is merged, I'm considering this problem to be fixed.