How to read values if possible CDATA
ejurgensen opened this issue · 2 comments
First of all thanks for making and maintaining mxml. It's been used by OwnTone (previously forked-daapd) for many years, and has been really solid.
I take care of OwnTone, where this issue came recently. It involves mxml parsing RSS XML with a tag that has CDATA. This creates a bug, since OwnTone just reads values with mxmlGetOpaque, which doesn't return anything when it's CDATA. So now I am seeking advice on what might be the best way to read a value, given that it is not known whether the input is CDATA or not.
Right now I'm thinking I should add a convience function like the below getStringValue(). However, it seems a bit crude to me. Is there a better way to do this?
#define CDATAXML "<?xml version=\"1.0\" ?><foo>FOO</foo><bar><![CDATA[BAR]]></bar>"
const char *
getStringValue(mxml_node_t *xml, const char *key)
{
mxml_node_t *parent = mxmlFindElement(xml, xml, key, NULL, NULL, MXML_DESCEND);
mxml_node_t *child = mxmlGetFirstChild(parent);
mxml_type_t type = mxmlGetType(child);
if (type == MXML_ELEMENT)
return mxmlGetCDATA(child);
return mxmlGetOpaque(child);
}
int
main(int argc, char *argv[])
{
mxml_node_t *xml = mxmlLoadString(NULL, CDATAXML, MXML_OPAQUE_CALLBACK);
printf("foo is '%s'\n", getStringValue(xml, "foo"));
printf("bar is '%s'\n", getStringValue(xml, "bar"));
return 0;
}
🤷♂️ That's probably as clean as you'll be able to make it.
Fwiw I ended up with these wrappers:
mxml_node_t *
xml_get_node(mxml_node_t *top, const char *path)
{
mxml_node_t *node;
mxml_type_t type;
// This example shows why we can't just return the result of mxmlFindPath:
// <?xml version="1.0""?><rss>
// <channel>
// <title><![CDATA[Tissages]]></title>
// mxmlFindPath(top, "rss/channel") will return an OPAQUE node where the
// opaque value is just the whitespace. What we want is the ELEMENT parent,
// because that's the one we can use to search for children nodes ("title").
node = mxmlFindPath(top, path);
type = mxmlGetType(node);
if (type == MXML_ELEMENT)
return node;
return mxmlGetParent(node);
}
// Walks through the children of the "path" node until it finds one that is
// not just whitespace and returns a trimmed value (except for CDATA). Means
// that these variations will all give the same result:
//
// <foo>FOO FOO</foo><bar>\nBAR BAR \n</bar>
// <foo>FOO FOO</foo><bar><![CDATA[BAR BAR]]></bar>
// <foo>\nFOO FOO\n</foo><bar>\n<![CDATA[BAR BAR]]></bar>
const char *
xml_get_val(mxml_node_t *top, const char *path)
{
mxml_node_t *parent;
mxml_node_t *node;
mxml_type_t type;
const char *s = "";
parent = xml_get_node(top, path);
if (!parent)
return NULL;
for (node = mxmlGetFirstChild(parent); node; node = mxmlGetNextSibling(node))
{
type = mxmlGetType(node);
if (type == MXML_OPAQUE)
s = trim(mxmlGetOpaque(node));
else if (type == MXML_ELEMENT)
s = mxmlGetCDATA(node);
if (s && *s != '\0')
break;
}
return s;
}