golang/go

encoding/xml: support for XML namespace prefixes

jordan2175 opened this issue Β· 30 comments

Marshal-ing data back to XML does not seem to support namespace prefixes.. For example:

https://play.golang.org/p/6CY71H7mb4

The input XML is: <stix:STIX_Package>, but when it writes it back out it does <STIX_Package xmlns="stix">

Also, there does not appear to be any data elements like xml.Name for adding namespaces to a struct.... Something maybe like xml.NS????

Dup of #6800

I think what I am looking for, as I do more research on it, is for xml.Marshal to support prefixes

Here is some more details about this issue: If you run this code:

https://play.golang.org/p/EucDh59yiB

package main

import (
    "encoding/xml"
    "fmt"
)

type HouseType struct {
    XMLName   xml.Name `xml:"prefix11 House"`
    MessageId string   `xml:"message_id,attr"`
}

func main() {

    var tm HouseType
    tm.MessageId = "test1234"

    var data1 []byte
    data1, _ = xml.MarshalIndent(tm, "", "    ")

    fmt.Println("Marshal")
    fmt.Println(string(data1))

    rawxml := `<prefix11:House message_id="1466" in_response_to="1"></prefix11:House>`

    var tm2 HouseType
    xml.Unmarshal([]byte(rawxml), &tm2)
    fmt.Println("\nUnmarshal")
    fmt.Println("Message ID", tm2.MessageId)
}

It will print out:

Marshal
<House xmlns="prefix11" message_id="test1234"></House>

Unmarshal
Message ID 1466

You will see that the marshal command does not make use of the prefix correctly. It should be prefix11:House. If I change the struct to look like the following then the marshal command will work but the unmarshal will not. NOTE the ":" in the XMLName field. So I can either marshal or unmarshal, but not both with prefixed XML

https://play.golang.org/p/44CMHXb3YM

type HouseType struct {
    XMLName   xml.Name `xml:"prefix11:House"`
    MessageId string   `xml:"message_id,attr"`
}

Making this change, changes the output to:

Marshal
<prefix11:House message_id="test1234"></prefix11:House>

Unmarshal
Message ID
rsc commented

See #11841.

CL https://golang.org/cl/12570 mentions this issue.

rsc commented

Blocked on #13400.

Just a note in support of this proposal: with encoding/xml as it stands, it is infeasible to interact with SOAP web services that use digitally signed requests. This is because XML digital signatures require the XML to be in canonical form, which requires a greater degree of control over namespaces than encoding/xml provides. Admittedly, calling a SOAP service from a Go program is kind of like towing a trailer behind your Ferrari, but sometimes interoperability with legacy systems is essential.

Just a workaround while waiting for a fix:

type Data ...

type Root struct {
    XMLName  xml.Name  `xml:"prefix:root"`
    XmlNS    string    `xml:"xmlns:prefix,attr"`
    Data Data `xml:"data"`
}

root := Root {
    XmlNS: "urn:test.example.com",
    Data: ...,
}

b, err := xml.MarshalIndent(root, "", "    ")

This will produce the following:

<prefix:root xmlns:prefix="urn:test.example.com"/>
    <data>...</data>
</prefix:root>

@vania-pooh Am I right in assuming that I can't unmarshal a namespaced tag into a namespaced struct? (I.e. go sees them as two different objects)

Here's an example of what I mean, here "urn:copyright" is not unmarshaled, but if the name is changed to "copyright" is: https://play.golang.org/p/lb1oZ0ATwz

@karl-gustav not sure, I played only with marshalling. But so far as I understand currently Go supports only prefixed attributes and does not support prefixed tag names.

uynap commented

@vania-pooh It's apparently not a valid "workaround". Marshal and UnMarshal are not working with the same tag.
In your example, you have to use XMLName xml.Name xml:"prefix:root" for Marshal and XMLName xml.Name xml:"prefix root" for UnMarshal. (Go version 1.8)

@uynap didn't test with unmarshal. This was marshalling only workaround.

Do we have a minimal working example of a full Marshal / UnMarshal workaround ?

uynap commented

@karl-gustav The issue for Golang "XML namespace prefixes" is that you cannot use one struct for both Marshal and UnMarshal. But it's fairly easy to have the code only supports Marshal or UnMarshal.
For example, when you do Marshal use the code below:

type XMLenvelop struct {
    XMLName xml.Name `xml:"soapenv:Envelope"`
}

for UnMarshal:

type XMLenvelop2 struct {
    XMLName xml.Name
}

Hi,

since I really wanted this in my private project I started playing around and
I actually have a working version for me. Could someone with deeper knowledge of the lib take a look at the changes and tell me whether that actually makes sense (zauberstuhl@eb84a70)?

Here is an example of how that could look like:
https://play.golang.org/p/O4Ene-8GrVV

Cheers

@sophos this works for XML-tags, but do not work for XML-attributes

Change https://golang.org/cl/116056 mentions this issue: encoding/xml: fix printing of namespace prefix in tag names

iwdgo commented

With the submitted fix, prefix displays when defined in tag names, i.e. no made up prefix is taken into account. The URL of the name space is also returned for an End Token as documentation requires.
Translating the prefix is popping the NS which is unavailable for translate. Translate for an End Token has been moved inside the pop element part.

When a tag.Space has no prefix, it is the default space xmlns=".Space" according to documentation. Since there is no prefix the print remains <tag.Name.local … and not <Space:.Local…

The fix works on top of the merged fixes of the list issues in #13400 as namespace standard needs to be enforced before improving handling.

Just a workaround while waiting for a fix:

type Data ...

type Root struct {
    XMLName  xml.Name  `xml:"prefix:root"`
    XmlNS    string    `xml:"xmlns:prefix,attr"`
    Data Data `xml:"data"`
}

root := Root {
    XmlNS: "urn:test.example.com",
    Data: ...,
}

b, err := xml.MarshalIndent(root, "", "    ")

This will produce the following:

<prefix:root xmlns:prefix="urn:test.example.com"/>
    <data>...</data>
</prefix:root>

This workaround only works when marshaling, but fails when you want to unmarshal :/

genez commented

I have the same issue, while interacting with an Italian government service.

This playground demonstrate the issue: https://play.golang.org/p/H5Hibbci81_n

For the record, the prefix shouldn't actually matter. If two prefixes resolve to the same namespace then they are identical, including the empty prefix.

https://www.w3.org/TR/xml-names/#NT-PrefixedName

Note that the prefix functions only as a placeholder for a namespace name. Applications SHOULD use the namespace name, not the prefix, in constructing names whose scope extends beyond the containing document.

The full solution here is for the package to support full qualified names, and then have some notion of a map between Namespaces and Prefixes that is determined at marshal-time.

<prefix:root xmlns:prefix="urn:test.example.com"/>
</prefix:root>

Is (or should be) treated identically to:

<root xmlns="urn:test.example.com"/>
</root>

And

<bort:root xmlns:bort="urn:test.example.com"/>
</bort:root>

Since the qualified name of the element is {urn:test.example.com}root

At least that's how libxml2 and python seem to handle it.

thoro commented

As nimish said it shouldn't actually matter, but there's soo many broken XML implementations out there, that the support for custom prefix naming would be great. Just as suggested it should be a Marshal Time configuration.

A possible implementation could look like:

namespaces := map[string]string{
    "bort": "urn:test.example.com",
}

encoder := xml.NewEncoder(memWriter)
encoder.SetNamespaces(namespaces)
encoder.Encode(response)

Any namespace defined are defined at the top most location and then not issued at any sub location anymore. Could fix a lot of issue where you have to interface with bad implementations.

Edit: Interestingly the xmlns handling is implemented for attributes but not for elements! (see https://golang.org/src/encoding/xml/marshal.go createAttrPrefix)

Yeah, unfortunately the real world is full of bad xml handling :(
Also, if two elements share a namespace, and one is a child of the other, it'll repeat the xmlns="" declaration leading to a lot of redundancy. Would be better to omit, or assign a prefix and use that.

any updates?

Any updates will be reported here. Please don't ask for updates here. Ask on a forum instead. See https://golang.org/wiki/Questions. Thanks.

issue opened in 2015 and always same problem in 2020

For the record, the prefix shouldn't actually matter. If two prefixes resolve to the same namespace then they are identical, including the empty prefix.

https://www.w3.org/TR/xml-names/#NT-PrefixedName

This is incorrect. See https://www.w3.org/TR/xml-infoset/#infoitem.element

All three parts of a QName for an element (local name, namespace uri, prefix) are part of the model, and needed for say proper evaluation of XSLT or XML DSig.

This is unfortunately because of four flagpole reasons:

  1. They defined initial namespace usage without DOM support, so documents were parsed without the namespace axis
  2. The xmlns and prefix were static in the document type definition (DOCTYPE) until other schema definitions came along
  3. XSLT and XPath (along with a few other systems of XML-based processing definitions) used their place in the document to inherit context for expressions. So e.g. an XPath selector in an XSLT can use namespaces declared within the XSLT document in that scope. This means that the prefix is an essential part of evaluating this document, unless you have a model where XPath is a first class type in your DOM - which is extremely difficult since XPath isn't an XML infoset type like element or attribute - its just text.
  4. XML Digital Signatures, having to cope with all of this, described a default system of canonicalizing XML into a predictable binary form. It had to further codify requirements in the XML systems for compatibility where prefixes were maintained, in order to support signed documents which used technology like XPath.

If you are marshalling the entire purpose is to translate from the infoset model into a restricted model represented by the struct. But for things like reading/manipulating elements, attributes, text, etc you want to use the infoset model.

I would agree that it shouldn't have really mattered. Namespaces were deferred and never quote bolted on perfectly, while XPath and XSLT knocked them further loose. C'est la vie.

I am in awe that such a documented bug hinders such useful features has survived this long without fixing.

I hope somebody fixes this. For now, I found a replacement to encoding/xml using nbio/xml which handles namespaces properly.

Hey any update on this?
This is still a major issue 9 years later, and it would be nice to have a reliable XML parser in Golang one day.