Dynamic memory problems with nested "source"
leres opened this issue · 2 comments
Riza Dindir emailed me a private bug report:
Here is the problem. I have a file "~/.nexrc", which sources "docs/dot.nexrc". The dot.nexrc on the other hand sources "docs/settings.rc" and "docs/macros.rc" files.
src/dot.nexrc has these lines
source ~/docs/settings.rc
source ~/docs/macros.rc
When I remove one of the lines above (or comment one line out), vi starts up and sets up macros. But if both lines are in there it dumps core.
I played around in a 13.1 poudriere jail and was able to reproduce via:
unsetenv EXINIT
cd
mkdir docs
echo 'source /dev/null' > docs/dot.nexrc
echo 'source /dev/null' >> docs/dot.nexrc
echo 'source ~/docs/dot.nexrc' > .nexrc
nvi docs
I built a nvi binary with -g and -O0 and tried valgrind:
zinc 192 @ valgrind --leak-check=yes --log-file=/tmp/valgrind.log /usr/bin/vi docs
zinc 193 @ cat /tmp/valgrind.log
==32399== Memcheck, a memory error detector
==32399== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==32399== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==32399== Command: /usr/bin/vi docs
==32399== Parent PID: 89767
==32399==
==32399== Invalid free() / delete / delete[] / realloc()
==32399== at 0x484E3BC: free (vg_replace_malloc.c:876)
==32399== by 0x13AD3F: ??? (in /usr/bin/nview)
==32399== by 0x145C66: ??? (in /usr/bin/nview)
==32399== by 0x130ED3: ??? (in /usr/bin/nview)
==32399== by 0x125693: ??? (in /usr/bin/nview)
==32399== by 0x12408C: ??? (in /usr/bin/nview)
==32399== by 0x4823007: ???
==32399== Address 0x5524af0 is 0 bytes inside a block of size 21 free'd
==32399== at 0x484E3BC: free (vg_replace_malloc.c:876)
==32399== by 0x13AD3F: ??? (in /usr/bin/nview)
==32399== by 0x145C66: ??? (in /usr/bin/nview)
==32399== by 0x130ED3: ??? (in /usr/bin/nview)
==32399== by 0x125693: ??? (in /usr/bin/nview)
==32399== by 0x12408C: ??? (in /usr/bin/nview)
==32399== by 0x4823007: ???
==32399== Block was alloc'd at
==32399== at 0x484C884: malloc (vg_replace_malloc.c:385)
==32399== by 0x13A12C: ??? (in /usr/bin/nview)
==32399== by 0x14614A: ??? (in /usr/bin/nview)
==32399== by 0x14B3BC: ??? (in /usr/bin/nview)
==32399== by 0x13C32B: ??? (in /usr/bin/nview)
==32399== by 0x145C66: ??? (in /usr/bin/nview)
==32399== by 0x130ED3: ??? (in /usr/bin/nview)
==32399== by 0x125693: ??? (in /usr/bin/nview)
==32399== by 0x12408C: ??? (in /usr/bin/nview)
==32399== by 0x4823007: ???
==32399==
==32399==
==32399== HEAP SUMMARY:
==32399== in use at exit: 180,118 bytes in 49 blocks
==32399== total heap usage: 493 allocs, 445 frees, 830,020 bytes allocated
==32399==
==32399== 100 bytes in 1 blocks are possibly lost in loss record 25 of 47
==32399== at 0x484C884: malloc (vg_replace_malloc.c:385)
==32399== by 0x13A190: ??? (in /usr/bin/nview)
==32399== by 0x14611D: ??? (in /usr/bin/nview)
==32399== by 0x14B3BC: ??? (in /usr/bin/nview)
==32399== by 0x146044: ??? (in /usr/bin/nview)
==32399== by 0x145C14: ??? (in /usr/bin/nview)
==32399== by 0x130ED3: ??? (in /usr/bin/nview)
==32399== by 0x125693: ??? (in /usr/bin/nview)
==32399== by 0x12408C: ??? (in /usr/bin/nview)
==32399== by 0x4823007: ???
==32399==
==32399== 148 bytes in 3 blocks are definitely lost in loss record 27 of 47
==32399== at 0x484C884: malloc (vg_replace_malloc.c:385)
==32399== by 0x13A190: ??? (in /usr/bin/nview)
==32399== by 0x14611D: ??? (in /usr/bin/nview)
==32399== by 0x14B3BC: ??? (in /usr/bin/nview)
==32399== by 0x13C32B: ??? (in /usr/bin/nview)
==32399== by 0x145C66: ??? (in /usr/bin/nview)
==32399== by 0x130ED3: ??? (in /usr/bin/nview)
==32399== by 0x125693: ??? (in /usr/bin/nview)
==32399== by 0x12408C: ??? (in /usr/bin/nview)
==32399== by 0x4823007: ???
==32399==
==32399== 817 bytes in 1 blocks are definitely lost in loss record 35 of 47
==32399== at 0x484C884: malloc (vg_replace_malloc.c:385)
==32399== by 0x4982122: cgetset (in /lib/libc.so.7)
==32399== by 0x48DC321: _nc_read_termcap_entry (in /lib/libncursesw.so.9)
==32399== by 0x48CCB57: _nc_read_entry2 (in /lib/libncursesw.so.9)
==32399== by 0x48C5912: _nc_setupterm (in /lib/libncursesw.so.9)
==32399== by 0x1255DE: ??? (in /usr/bin/nview)
==32399== by 0x12408C: ??? (in /usr/bin/nview)
==32399== by 0x4823007: ???
==32399==
==32399== LEAK SUMMARY:
==32399== definitely lost: 965 bytes in 4 blocks
==32399== indirectly lost: 0 bytes in 0 blocks
==32399== possibly lost: 100 bytes in 1 blocks
==32399== still reachable: 179,053 bytes in 44 blocks
==32399== suppressed: 0 bytes in 0 blocks
==32399== Reachable blocks (those to which a pointer was found) are not shown.
==32399== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==32399==
==32399== For lists of detected and suppressed errors, rerun with: -s
==32399== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 0 from 0)
Some things I noticed along the way:
- My recipe didn't work outside of the jail, I suspect the crash only occurs when the "right" pointer gets clobbered
- Once I was able to cause a crash I could cause it with both /usr/local/bin/nvi and (the 13.1 base) /usr/bin/vi -- I suspect this is a long standing bug
- Looking at ex_run_str() I didn't see how the memory allocated in v_wstrdup() would ever be freed
- I did not fully investigate all of the valgrind reports
- I think giving the docs directory as a argument is important because it causes a "... is not a regular file" warning
The first crash I attempted to debug involved two calls to vs_msgsave(), for the first call the msg queue was empty and the next pointer was null. But on the second call the next pointer had been clobbered with ascii. When I put a watchpoint on the memory location it was setenv() that was setting TERM from cl_vi_init() and somehow writing on top of the msg queue.
I think the problem might be with ex_source() which calls ex_run_str(). The ex_run_str() is the same compared with nvi version 1.8 (version 1.8 does work when sourcing from more than one files in an rc script). But the code for ex_source() is different (compared to nvi 1.8).
Can reproduce. I observed the following:
Breakpoint 4, cl_vi_init (sp=0x8007bd000) at /home/zy/devel/nvi2/cl/cl_screen.c:224
224 cl_putenv("TERM", ttype, 0);
10: *gp->msgq->slh_first = {q = {sle_next = 0x0}, mtype = M_ERR, buf = 0x800788270 "Warning: docs is not a regular file\n", len = 36}
(gdb) s
cl_putenv (name=0x204092 "TERM", str=0x800787010 "xterm-256color", value=0) at /home/zy/devel/nvi2/cl/cl_screen.c:565
565 if (str == NULL) {
(gdb)
569 return (setenv(name, str, 1));
(gdb) s
570 }
(gdb) s
cl_vi_init (sp=0x8007bd000) at /home/zy/devel/nvi2/cl/cl_screen.c:225
225 o_lines = getenv("LINES");
10: *gp->msgq->slh_first = {q = {sle_next = 0x61762f3d4c49414d}, mtype = 1634545522, buf = 0x800780079 "", len = 36}
Right after setting the TERM
environment variable, the gp->msgq
points to a bad data structure. But so far, gdb wasn't able to tell what callback was running to make this change.