lichray/nvi2

Dynamic memory problems with nested "source"

leres opened this issue · 2 comments

leres commented

Riza Dindir emailed me a private bug report:

Here is the problem. I have a file "~/.nexrc", which sources "docs/dot.nexrc". The dot.nexrc on the other hand sources "docs/settings.rc" and "docs/macros.rc" files.

src/dot.nexrc has these lines

    source ~/docs/settings.rc
    source ~/docs/macros.rc

When I remove one of the lines above (or comment one line out), vi starts up and sets up macros. But if both lines are in there it dumps core.

I played around in a 13.1 poudriere jail and was able to reproduce via:

unsetenv EXINIT
cd
mkdir docs
echo 'source /dev/null' > docs/dot.nexrc
echo 'source /dev/null' >> docs/dot.nexrc
echo 'source ~/docs/dot.nexrc' > .nexrc
nvi docs

I built a nvi binary with -g and -O0 and tried valgrind:

zinc 192 @ valgrind --leak-check=yes --log-file=/tmp/valgrind.log /usr/bin/vi docs
zinc 193 @ cat /tmp/valgrind.log 
==32399== Memcheck, a memory error detector
==32399== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==32399== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==32399== Command: /usr/bin/vi docs
==32399== Parent PID: 89767
==32399== 
==32399== Invalid free() / delete / delete[] / realloc()
==32399==    at 0x484E3BC: free (vg_replace_malloc.c:876)
==32399==    by 0x13AD3F: ??? (in /usr/bin/nview)
==32399==    by 0x145C66: ??? (in /usr/bin/nview)
==32399==    by 0x130ED3: ??? (in /usr/bin/nview)
==32399==    by 0x125693: ??? (in /usr/bin/nview)
==32399==    by 0x12408C: ??? (in /usr/bin/nview)
==32399==    by 0x4823007: ???
==32399==  Address 0x5524af0 is 0 bytes inside a block of size 21 free'd
==32399==    at 0x484E3BC: free (vg_replace_malloc.c:876)
==32399==    by 0x13AD3F: ??? (in /usr/bin/nview)
==32399==    by 0x145C66: ??? (in /usr/bin/nview)
==32399==    by 0x130ED3: ??? (in /usr/bin/nview)
==32399==    by 0x125693: ??? (in /usr/bin/nview)
==32399==    by 0x12408C: ??? (in /usr/bin/nview)
==32399==    by 0x4823007: ???
==32399==  Block was alloc'd at
==32399==    at 0x484C884: malloc (vg_replace_malloc.c:385)
==32399==    by 0x13A12C: ??? (in /usr/bin/nview)
==32399==    by 0x14614A: ??? (in /usr/bin/nview)
==32399==    by 0x14B3BC: ??? (in /usr/bin/nview)
==32399==    by 0x13C32B: ??? (in /usr/bin/nview)
==32399==    by 0x145C66: ??? (in /usr/bin/nview)
==32399==    by 0x130ED3: ??? (in /usr/bin/nview)
==32399==    by 0x125693: ??? (in /usr/bin/nview)
==32399==    by 0x12408C: ??? (in /usr/bin/nview)
==32399==    by 0x4823007: ???
==32399== 
==32399== 
==32399== HEAP SUMMARY:
==32399==     in use at exit: 180,118 bytes in 49 blocks
==32399==   total heap usage: 493 allocs, 445 frees, 830,020 bytes allocated
==32399== 
==32399== 100 bytes in 1 blocks are possibly lost in loss record 25 of 47
==32399==    at 0x484C884: malloc (vg_replace_malloc.c:385)
==32399==    by 0x13A190: ??? (in /usr/bin/nview)
==32399==    by 0x14611D: ??? (in /usr/bin/nview)
==32399==    by 0x14B3BC: ??? (in /usr/bin/nview)
==32399==    by 0x146044: ??? (in /usr/bin/nview)
==32399==    by 0x145C14: ??? (in /usr/bin/nview)
==32399==    by 0x130ED3: ??? (in /usr/bin/nview)
==32399==    by 0x125693: ??? (in /usr/bin/nview)
==32399==    by 0x12408C: ??? (in /usr/bin/nview)
==32399==    by 0x4823007: ???
==32399== 
==32399== 148 bytes in 3 blocks are definitely lost in loss record 27 of 47
==32399==    at 0x484C884: malloc (vg_replace_malloc.c:385)
==32399==    by 0x13A190: ??? (in /usr/bin/nview)
==32399==    by 0x14611D: ??? (in /usr/bin/nview)
==32399==    by 0x14B3BC: ??? (in /usr/bin/nview)
==32399==    by 0x13C32B: ??? (in /usr/bin/nview)
==32399==    by 0x145C66: ??? (in /usr/bin/nview)
==32399==    by 0x130ED3: ??? (in /usr/bin/nview)
==32399==    by 0x125693: ??? (in /usr/bin/nview)
==32399==    by 0x12408C: ??? (in /usr/bin/nview)
==32399==    by 0x4823007: ???
==32399== 
==32399== 817 bytes in 1 blocks are definitely lost in loss record 35 of 47
==32399==    at 0x484C884: malloc (vg_replace_malloc.c:385)
==32399==    by 0x4982122: cgetset (in /lib/libc.so.7)
==32399==    by 0x48DC321: _nc_read_termcap_entry (in /lib/libncursesw.so.9)
==32399==    by 0x48CCB57: _nc_read_entry2 (in /lib/libncursesw.so.9)
==32399==    by 0x48C5912: _nc_setupterm (in /lib/libncursesw.so.9)
==32399==    by 0x1255DE: ??? (in /usr/bin/nview)
==32399==    by 0x12408C: ??? (in /usr/bin/nview)
==32399==    by 0x4823007: ???
==32399== 
==32399== LEAK SUMMARY:
==32399==    definitely lost: 965 bytes in 4 blocks
==32399==    indirectly lost: 0 bytes in 0 blocks
==32399==      possibly lost: 100 bytes in 1 blocks
==32399==    still reachable: 179,053 bytes in 44 blocks
==32399==         suppressed: 0 bytes in 0 blocks
==32399== Reachable blocks (those to which a pointer was found) are not shown.
==32399== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==32399== 
==32399== For lists of detected and suppressed errors, rerun with: -s
==32399== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 0 from 0)

Some things I noticed along the way:

  • My recipe didn't work outside of the jail, I suspect the crash only occurs when the "right" pointer gets clobbered
  • Once I was able to cause a crash I could cause it with both /usr/local/bin/nvi and (the 13.1 base) /usr/bin/vi -- I suspect this is a long standing bug
  • Looking at ex_run_str() I didn't see how the memory allocated in v_wstrdup() would ever be freed
  • I did not fully investigate all of the valgrind reports
  • I think giving the docs directory as a argument is important because it causes a "... is not a regular file" warning

The first crash I attempted to debug involved two calls to vs_msgsave(), for the first call the msg queue was empty and the next pointer was null. But on the second call the next pointer had been clobbered with ascii. When I put a watchpoint on the memory location it was setenv() that was setting TERM from cl_vi_init() and somehow writing on top of the msg queue.

I think the problem might be with ex_source() which calls ex_run_str(). The ex_run_str() is the same compared with nvi version 1.8 (version 1.8 does work when sourcing from more than one files in an rc script). But the code for ex_source() is different (compared to nvi 1.8).

Can reproduce. I observed the following:

Breakpoint 4, cl_vi_init (sp=0x8007bd000) at /home/zy/devel/nvi2/cl/cl_screen.c:224
224             cl_putenv("TERM", ttype, 0);
10: *gp->msgq->slh_first = {q = {sle_next = 0x0}, mtype = M_ERR, buf = 0x800788270 "Warning: docs is not a regular file\n", len = 36}
(gdb) s
cl_putenv (name=0x204092 "TERM", str=0x800787010 "xterm-256color", value=0) at /home/zy/devel/nvi2/cl/cl_screen.c:565
565             if (str == NULL) {
(gdb)
569                     return (setenv(name, str, 1));
(gdb) s
570     }
(gdb) s
cl_vi_init (sp=0x8007bd000) at /home/zy/devel/nvi2/cl/cl_screen.c:225
225             o_lines = getenv("LINES");
10: *gp->msgq->slh_first = {q = {sle_next = 0x61762f3d4c49414d}, mtype = 1634545522, buf = 0x800780079 "", len = 36}

Right after setting the TERM environment variable, the gp->msgq points to a bad data structure. But so far, gdb wasn't able to tell what callback was running to make this change.