matthieu-labas/sxmlc

Tag by Tag

embbo opened this issue · 4 comments

embbo commented

Hi, I am using sxmlc Parsor which is really good for my Micro controller, but my question is it possible if I can parsor tag by tag instead of whole xml. beacuse my micro controller ihave low memory ..

for example .
Ist it just copy these Tags
<UAObject NodeId="i=20002" BrowseName="0:Control" ParentNodeId="i=85"> ..... ...... .... </UAObject>

than next tags

... ... ...

You'll probably want to use the SAX part instead of DOM: you provide callbacks that will be called whenever a new "XML event" occurs. See the SAX_Callbacks struct in sxmlc.h and the HOWTO.

If the structure of the string you want to parse is fixed and a "single node" (e.g. <node attr="val" attr2="val2" />) you can use the XML_parse_1string(str, &node) to parse its definition into node. But if it has text (e.g. <node attr="val" attr2="val2">some text</node>) then you can use XMLDoc_parse_buffer_SAX() providing a set of callbacks.

embbo commented

i have multiple tags

<node` attr="val1" attr2="val1" />
<node attr="val2" attr2="val2" />
<node attr="val3" attr2="val3" />

I don't want to read whole data at once because I am using micro-controller
I just want to read and parse 1 by 1

What do you mean by "whole data"?

There are two modes working with sxmlc:

  • DOM needs to have the whole XML text loaded in memory and parses it in a tree structure. That's more convenient to use but requires more memory because the parsing occurs on the whole XML text.
  • SAX needs to have only a single node loaded in memory and calls a function when a new XML token is discovered (tag and attributes, text). That is easier on memory because only one node is needed at once. If you have memory constraints, you should use SAX, not DOM.

But even using SAX, you still need to load in memory the whole node text, e.g. <node attr="val" attr2="val2" /> or <UAObject NodeId="i=20002" BrowseName="0:Control" ParentNodeId="i=85"> ..... ...... .... </UAObject>.

So two questions:

  • How much memory do you have available?
  • Where do you receive the data from? A socket? A file? At some point the data has to be stored somewhere; you might want to use memory mapping techniques so you don't need to re-allocate a buffer when its data is already available somewhere...

N.B. if the data comes from a socket, you can plug its output (socket receiving file descriptor) directly to sxmlc for reading (see https://stackoverflow.com/a/1941472/1098603).

I just want to read and parse 1 by 1

If by "one by one" you mean "first the first node, then the second node, then the third node", then you should do the following:

  1. Read one item (e.g. one line) into a buffer str (or memory-map it)
  2. Call XML_parse_1string(str, &node) to parse it into an XMLNode node variable
  3. Read another item
  4. Call XML_parse_1string(str, &node) to parse the new node
  5. etc.

Note that:

  • you have to have some kind of separator that you can detect in order to read an "item" up to the separator. An \n is a good example, but the best is a \0 of course
  • you need to make sure you handle your memory allocations properly. sxmlc will allocate new buffers for each data (i.e. one for the tag, one for each attribute name and value, one for the text) so you'll end up using twice as much memory at peak, but you can free the buffer memory after the parsing is done (or reuse it later)