zeux/pugixml

I would like to be able to serialize strings containing the null character

olowo726 opened this issue · 1 comments

First, if this is possible but I haven't found out how, please let me know how.

I need to serialize a C array of chars, say char buf[10], which might contain the null character '\0'. The format establishes, standardized and governed by a rather major standardization organization so it's not something you change. My issue is that as soon as a null character is encountered pugi aborts. So, what I would like the output to contain is

<VT xml:space="preserve">&#0;&#48;</VT>

for the string "\08"

Can this be accomplished?

zeux commented

You can accomplish this by escaping the data yourself - storing &#0; in the string data you store in the tree, and use format_no_escapes flag when writing the data to a file. Of course, this would require escaping all data in the same fashion.

When parsing files like this, you'd need to also disable unescaping - using parse_default & ~parse_escapes option mask. Similarly, this would require unescaping the data during processing in your application.

Note that the resulting file is not valid per XML specification, as any characters escaped with &# must mach https://www.w3.org/TR/xml/#NT-Char - but I understand that you can't change this.