incomplete parsing using myhtml_node_next and myhtml_node_text
Opened this issue · 3 comments
Hi
I am using myhtml to parse following html code
<html>
<span class="c3">
<span class="sonne" title="Sonnenscheindauer"><img width="20" height="20" src="whatever1.img" alt="sun" />0.0 h</span>
<span class="regen" title="Niederschlagsmenge"><img width="20" height="20" src="whatever2.img" alt="rain" />0 mm</span>
</span>
</html>
I expect to get: the 0.0h and 0 mm
my understaning of the tree is:
tag:span class c3
- tag: span
- attrib class
- attrib title
- tag: img
- attrib width
- attrib height
- attrib src
- 0 mm
I use:
node: span with class c3
subnode1: the tags span, img and the required text
pseudo code:
subNode1 = myhtml_node_child(node);
while (subNode1 != NULL) {
if (subNode1 != NULL) {printf("child: of %lu -> %s\n", myhtml_node_tag_id(node), myhtml_node_text( subNode1,&len ) );
subNode1 = myhtml_node_next(subNode1 );
}
the compete source code is attached as well as the html file
I am able to parse e.g. the tags span class=regen, the img with it's attributes but not the text: "0 mm"
do you havea suggestion?
@parser12 hi!
The input data is not clear. Please, use (for comments, markdown)
```HTML
<html>In this place HTML tags</html>
```
and for C code:
```C
subNode1 = myhtml_node_child(node);
```
See Creating and highlighting code blocks
Thanks!
After parsing, you get this tree:
<html>
<head>
<body>
<span class="c3">
"
"
<span class="sonne" title="Sonnenscheindauer">
<img width="20" height="20" src="whatever1.img" alt="sun">
"0.0 h"
"
"
<span class="regen" title="Niederschlagsmenge">
<img width="20" height="20" src="whatever2.img" alt="rain">
"0 mm"
"
"
"
"
This is a new line after the <span>
:
"
"
subNode1 = myhtml_node_child(node);
while (subNode1 != NULL) {
printf("child: %s\n", myhtml_tag_name_by_id(subNode1->tree, myhtml_node_tag_id(subNode1), NULL));
if (myhtml_node_tag_id(subNode1) == MyHTML_TAG__TEXT) {
printf("Text: %s\n", myhtml_node_text(subNode1, NULL));
}
subNode1 = myhtml_node_next(subNode1);
}
Output:
child: -text
Text: \n
child: span
child: -text
Text: \n
child: span
child: -text
Text: \n
I think the general meaning is clear?
For your task, see function for search nodes and example.
P.S.: you can use Modest and selectors for this.
Thank you for the really fast response.
I was in the oppinion I had this as a solution already but I was looping on the node-level, not on the subNote1 level; I assume.