zeux/pugixml

Parses self-closing tags differently than the "old way"?

dmatsumoto opened this issue · 1 comments

I'm using the following XML string for some pugixml unit tests:

string buffer =
            "<?xml version='1.0' encoding='ASCII' ?>\n"
            "<Parent file='test' md5sum='8e5db7acffe6249ba3351f50c7e9eb5f' version='1.0' >\n"
            "\t<Child1 child1_attribute='child1_attribute_value' />\n"
            "</Parent>";

When I load this buffer with the following code:

	pugi::xml_document doc;
	doc.load_string(buffer.c_str());

The resulting DOM object has the right data. However, if I use the old style tag, or if I add text to the Child1 element, e.g.:

"\t<Child1 child1_attribute='child1_attribute_value' ></Child1>\n"
or
"\t<Child1 child1_attribute='child1_attribute_value' >child1_text</Child1>\n"

the DOM object is incorrect. What happens is that the Child1 node magically gets another child node with a null name, and has a value of "Child1". I've tried to force my code to ignore this extra node by comparing the type to node_null, but that doesn't appear to work.

Since I'm new to pugixml, I figure I could be doing something wrong -- however, I don't think that's the case, because of the difference in behavior between the self-closing tag and non-self-closing tag.

zeux commented

In your second example, you should expect that Child1 node has a child with type node_pcdata and value child1_text. See https://pugixml.org/docs/manual.html#node_pcdata.

In your first example, there's no text between the opening and closing tag so there should be no children - the tree representation should be exactly the same between <child1/> and <child1></child1>.