golang/go

encoding/xml: brittle support for matching a namespace by identifier or url

pkieltyka opened this issue · 5 comments

The issue is that I believe a struct tag's namespace should be matchable by the xmlns identifier or url.

To shed some light on the issue, consider a RSS feed parser thats deals with namespaces from a variety of definitions. I could expect a few different kinds of xmlns definitions for the same type of structure. ie. consider mRSS feeds in the wild that use the "media" namespace, you will find:

  1. Xmlns wasn't defined, but the namespace was used (ie. for mRSS with media namespace)
  2. Xmlns was defined as xmlns:media="http://search.yahoo.com/mrss/"
  3. Xmlns was defined as xmlns:media="http://search.yahoo.com/mrss"

I noticed that encoding/xml would track the xmlns' in a map to the url, and would match the struct tags to the url. The issue of course here is with 2 and 3, where the difference between a "/" would throw off the parser.

I wrote a fix (including tests) using Go 1.5.1's encoding/xml code: pkieltyka/xml@7ad1fab

Consider a partial parser for the media rss module:

type Media struct {
  Title Title `xml:"media title"`
  Description Description `xml:"media description"`
  Thumbnails []Thumbnail `xml:"media thumbnail"`
  Contents []Content `xml:"media content"`
  MediaGroups []Group `xml:"media group"`
}

Notice the using the namespace prefix in the struct tag instead of the ns url. But, if xmlns:media="URL" was defined in the original document, the parser would expect to match it by the URL, but IMO, it should check both the prefix and url of the namespace. I'm reporting this issue and will submit the fix separately, thanks for the consideration.

CL https://golang.org/cl/14601 mentions this issue.

We need a consortium of ideas in general around the namespace support for the xml package before considering this change.

See #11496, #11496 and #6800.

1.5 cycle unnecessary broke the existing behavior of the package for many cases and the changes that have gone through to address the namespacing bugs had to be reverted, see #11841.

iwdgo commented

This proposal is not in line with the namespace XML standard (https://www.w3.org/TR/xml-names/#NSNameComparison) which explicitely states that URI are treated as strings and must be exactly identical, i.e. without escaping or any other manipulation.

https://go.dev/play/p/zBgGuTzbMoe?v=gotip contains the test provided.