golang/go

encoding/xml: support QName values / expose namespace bindings

pdw-mb opened this issue · 6 comments

It's not uncommon for XML to contain QNames as element and attribute values, e.g.

  <my-document xmlns:foo="http//..." >
    <my-element>foo:bar</my-element>
  </my-document>

In order to correctly unmarshal the value, you need to know the namespace bindings in effect for my-element, but Decoder doesn't appear to expose this information. A simple addition to encoding/xml of:

  func (d *Decoder) NamespaceBindings() map[string]string {
    return d.ns
  }

allows unmarshallers to access the necessary information, for example, I can now write:

  type QName struct {
    Namespace string
    Local     string
  } 

 func (qname *QName) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
    var s string
    d.DecodeElement(&s, &start)
    i := strings.Index(s, ":")
    prefix := ""
    if i >= 0 {
      prefix = s[:i]
      qname.Namespace = s[i+1:]
    } else {
      qname.Namespace = s
    }
    var ok bool
    qname.Namespace, ok = d.NamespaceBindings()[prefix]
    if !ok {
      return errors.New("Unbound namespace prefix: " + prefix)
    }
    return nil
  }

Arguably, something like the above, and a corresponding attribute unmarshaller could be provided on the standard xml.Name.

More discussion of this issue here:

https://groups.google.com/forum/#!searchin/golang-nuts/QName/golang-nuts/DexmVLQOJxk/whBaKK9ntHsJ

go version go1.5 darwin/amd64

md5 commented

This could help with an issue I reported at hooklift/gowsdl#37

In that case, the gowsdl library is trying to parse SOAP envelopes that have variable body content, but the VirtualBox web service is putting the xmlns:vbox="http://www.virtualbox.org" declaration on the <SOAP-ENV:Envelope>. When the innerxml of the <SOAP-ENV:Body> is parsed, the xmlns:vbox="http://www.virtualbox.org" mapping is unavailable to the newly created xml.Decoder inside the second call to xml.Unmarshal.

Having access to the namespaces from the outer xml.Decoder and being able to pass them to a new xml.Decoder would be one way to deal with this issue. 👍

rsc commented

Thanks for the note. I think we may try one more time to get namespaces right in xml. And then we're going to give up and say "what we've got is what we've got."

md5 commented

Thanks @rsc.

In the gowsdl case, I ended up implementing xml.Unmarshaller to allow the tool to process the whole XML file in a single pass: hooklift/gowsdl#43

Thanks @rsc. I think what's there is pretty close. QName values in XML documents are inherently problematic because you need access to the current namespace bindings in order to understand them, but they are fairly widely used.

After filing this issue, I've realised that the fix for attributes is more problematic as UnmarshallXMLAttr doesn't currently get passed the Decoder object, so addressing this would require a breaking change to the API, rather than just the addition of a method.

rsc commented

Blocked on #13400.

I've implemented the proposed change in a fork that can be found here: https://code.blinkace.com/go/xml

The relevant changes are:

  • Add Decoder.NamespaceBindings to allow Unmarshalers to get access to current bindings
  • Alter UnmarshalXMLAttr to include the Decoder as a parameter (breaking change)
  • Add Encoder.GetPrefix to allow marshalers to insert prefixes needed for QName and other values that require them.

I've also implemented a QName package which is a Marshaler / Unmarshaler. This might be better merged with XMLName.