lexborisov/myhtml

NULL ptr dereference in tree node remove callback

RKX1209 opened this issue · 4 comments

POC HTML code is here

hexdump -C./nullderef_myhtml.html 
00000000  3c 55 52 3e 3c 55 3e 3c  50 3e 3c 2f 55 20        |<UR><U><P></U |
0000000e

While parsing above HTML code, myhtml try to remove <P> tag because <U> tag is not closed correctly. "</U " is not a valid closing tag.

Here is a log of exapmle/callback_tree_node_high_level.c

./myhtml/bin/myhtml/callback_tree_node_high_level ./nullderef_myhtml.html
Insert node 0x5639b06a2108(parent: 0x5639b06a20a8)
Insert html to parent -undef
Insert node 0x5639b06a2168(parent: 0x5639b06a2108)
Insert head to parent html
Insert node 0x5639b06a21c8(parent: 0x5639b06a2108)
Insert body to parent html
Insert node 0x5639b06a2228(parent: 0x5639b06a21c8)
Insert ur to parent body
Insert node 0x5639b06a2288(parent: 0x5639b06a2228)
Insert u to parent ur
Insert node 0x5639b06a22e8(parent: 0x5639b06a2288)
Insert p to parent u
Remove node 0x5639b06a22e8(parent: (nil))
Segmentation fault (core dumped)

It try to remove <P> tag (0x5639b06a22e8) and causes NULL ptr dereference at myhtml_node_tag_id( myhtml_node_parent(node) ); in callback_node_remove().

void callback_node_remove(myhtml_tree_t* tree, myhtml_tree_node_t* node, void* ctx)
{
printf("Remove node %p(parent: %p)\n", node, myhtml_node_parent(node));
    const char *tag_name = myhtml_tag_name_by_id(tree, myhtml_node_tag_id(node), NULL);
    const char *tag_name_parent = myhtml_tag_name_by_id(tree, myhtml_node_tag_id( myhtml_node_parent(node) ), NULL);
    printf("Remove %s from parent \n", tag_name);
}

Here is a crash log.

Program terminated with signal SIGSEGV, Segmentation fault.
#0  myhtml_node_tag_id (node=0x0) at source/myhtml/./myhtml.c:728
728         return node->tag_id;
[Current thread is 1 (Thread 0x7f22d80c8700 (LWP 31615))]
(gdb) bt
#0  myhtml_node_tag_id (node=0x0) at source/myhtml/./myhtml.c:728
#1  0x000055bd21fd41c3 in callback_node_remove (tree=0x55bd22909b50,
    node=0x55bd22970448, ctx=<optimized out>)
    at myhtml/callback_tree_node_high_level.c:89
#2  0x000055bd21fe0a0f in myhtml_tree_node_remove (node=node@entry=0x55bd22970448)
    at source/myhtml/./tree.c:465
#3  0x000055bd21fe2473 in myhtml_tree_adoption_agency_algorithm (
    tree=0x55bd22909b50, token=token@entry=0x55bd229189d8, subject_tag_idx=138)
    at source/myhtml/./tree.c:1842
#4  0x000055bd21fd72fb in myhtml_insertion_mode_in_body (tree=0x55bd22909b50,
    token=0x55bd229189d8) at source/myhtml/./rules.c:1087
#5  0x000055bd21fda168 in myhtml_rules_tree_dispatcher (tree=0x55bd22909b50,
    token=0x55bd229189d8) at source/myhtml/./rules.c:3922
#6  0x000055bd21fea184 in myhtml_parser_stream (thread_id=<optimized out>,
    ctx=0x7f22d88ca0b8) at source/myhtml/./parser.c:28
#7  0x000055bd21fe6306 in mythread_function_queue_stream (arg=0x55bd22909668)
    at source/mycore/./thread_queue.c:605
#8  0x00007f22d84c16db in start_thread (arg=0x7f22d80c8700) at pthread_create.c:463
#9  0x00007f22d81ea88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Hi
Does anyone see my report?
Any questions are welcome!
Thanks

Hi Ren (@RKX1209 ),

Sorry for not responding for a long time.
I will definitely deal with this in within days.

Thanks for the report!

Sure Thanks!

@RKX1209
Fixed.
Thanks!