lexborisov/myhtml

incomplete parsing using myhtml_node_next and myhtml_node_text

Opened this issue · 3 comments

Hi
I am using myhtml to parse following html code

<html>
<span class="c3">
<span class="sonne" title="Sonnenscheindauer"><img width="20" height="20" src="whatever1.img" alt="sun" />0.0 h</span>
<span class="regen" title="Niederschlagsmenge"><img width="20" height="20" src="whatever2.img" alt="rain" />0 mm</span>
</span>
</html>

I expect to get: the 0.0h and 0 mm

my understaning of the tree is:
tag:span class c3

  • tag: span
    • attrib class
    • attrib title
  • tag: img
    • attrib width
    • attrib height
    • attrib src
  • 0 mm

I use:

node: span with class c3
subnode1: the tags span, img and the required text

pseudo code:

subNode1 = myhtml_node_child(node);
while (subNode1 != NULL) {
  if (subNode1 != NULL) {printf("child: of %lu -> %s\n", myhtml_node_tag_id(node), myhtml_node_text( subNode1,&len ) );
  subNode1 = myhtml_node_next(subNode1 );
}

the compete source code is attached as well as the html file

I am able to parse e.g. the tags span class=regen, the img with it's attributes but not the text: "0 mm"
do you havea suggestion?

@parser12 hi!
The input data is not clear. Please, use (for comments, markdown)

```HTML
<html>In this place HTML tags</html>
```

and for C code:
```C
subNode1 = myhtml_node_child(node);
```

See Creating and highlighting code blocks

Thanks!

@parser12

After parsing, you get this tree:

<html>
  <head>
  <body>
    <span class="c3">
      "
      "
      <span class="sonne" title="Sonnenscheindauer">
        <img width="20" height="20" src="whatever1.img" alt="sun">
        "0.0 h"
      "
      "
      <span class="regen" title="Niederschlagsmenge">
        <img width="20" height="20" src="whatever2.img" alt="rain">
        "0 mm"
      "
      "
    "
    "

This is a new line after the <span>:

      "
      "
subNode1 = myhtml_node_child(node);

while (subNode1 != NULL) {
    printf("child: %s\n", myhtml_tag_name_by_id(subNode1->tree, myhtml_node_tag_id(subNode1), NULL));

    if (myhtml_node_tag_id(subNode1) == MyHTML_TAG__TEXT) {
        printf("Text: %s\n", myhtml_node_text(subNode1, NULL));
    }

    subNode1 = myhtml_node_next(subNode1);
}

Output:

child: -text
Text: \n
child: span
child: -text
Text: \n
child: span
child: -text
Text: \n

I think the general meaning is clear?

For your task, see function for search nodes and example.

P.S.: you can use Modest and selectors for this.

Thank you for the really fast response.
I was in the oppinion I had this as a solution already but I was looping on the node-level, not on the subNote1 level; I assume.