zeux/pugixml

Empty text childs increase XML file size

Mai1er opened this issue · 5 comments

Mai1er commented

// (pseudo code)
xml::node Node = ParentNode.append_child("NewChildNode");
Node.text().set(""); // << create new text-type subchild with no text

// after save document to file we have:
<NewChildNode></NewChildNode>
// but must be (by default output format)
<NewChildNode />
// "virtual" empty text-child lock short form of record

zeux commented

Sure... why is this a problem? It's not clear to me if changing this is worth the benefit, as the application can just not create empty text children.

Mai1er commented

I have text fields assigned by a template, without checking for empty/filled.
There are many fields in the file and often most of them are empty.
This increases the size of the files by 20-30%, and there are... a lot of files on the disk.

Besides, why leave a mistake when you can do better?

zeux commented

Ok, and it's impractical for your application to check if the field is empty before assigning? What if the field value is purely white space?

Additionally, if your data has a lot of empty fields and the output size matters, do you need to add the nodes corresponding to empty fields at all? <NewChildNode /> still takes space and you might be able to omit the node entirely.

Overall I'm not entirely sure where the "mistake" is here. Maybe set() shouldn't even create a PCDATA node, or maybe the library works fine; it's an odd corner case.

Mai1er commented
  1. THATS not app side of responsibility
  2. empty fields need by format

This behavior is confusing because:

  1. you create a XML file with empty nodes -> after save with default flags then empty nodes are expanded.
  2. you load that file and save it again with default flags -> empty nodes got compressed.

I found out after comparing files with diff tool.
It is impractical to compare those files using a diff tool.

So in case one wants to have expanded empty nodes then one should use existing format flag format_no_empty_element_tags.
Omitting that flag one would always expect compressed empty nodes.