Crashes in perl_destruct() of multi-threaded perl 5.24.4
agentzh opened this issue · 11 comments
Hi there,
We've noted that for perl 5.24.4 compiled with the -Dusethreads
option, the latest perlcc
(tried both the 1.57 CPAN release and the master branch of this github repo, commit 862bdf2)'s generated binary executable would crash inside the perl_destruct()
C function call. The minimal reproducer is like this (thanks @xiaocang for preparing it):
my %unescapes = (
"\\" => "\\",
"'" => "'",
'"' => '"',
'n' => "\n",
't' => "\t",
'r' => "\r",
);
To run it:
$ perlcc -O2 --staticxs --Wc=-g -S -o a.out a.pl
$ ./a.out
free(): invalid pointer
Aborted (core dumped) ./a.out
Use gdb
to get more details of the assertion failure crash:
free(): invalid pointer
Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50 return ret;
=> 0x00007ffff6e5aefb <__GI_raise+267>: 48 8b 8c 24 08 01 00 00 mov rcx,QWORD PTR [rsp+0x108]
0x00007ffff6e5af03 <__GI_raise+275>: 64 48 33 0c 25 28 00 00 00 xor rcx,QWORD PTR fs:0x28
0x00007ffff6e5af0c <__GI_raise+284>: 44 89 c0 mov eax,r8d
0x00007ffff6e5af0f <__GI_raise+287>: 75 1d jne 0x7ffff6e5af2e <__GI_raise+318>
0x00007ffff6e5af11 <__GI_raise+289>: 48 81 c4 10 01 00 00 add rsp,0x110
0x00007ffff6e5af18 <__GI_raise+296>: 5b pop rbx
0x00007ffff6e5af19 <__GI_raise+297>: c3 ret
0x00007ffff6e5af1a <__GI_raise+298>: 66 0f 1f 44 00 00 nop WORD PTR [rax+rax*1+0x0]
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007ffff6e455b9 in __GI_abort () at abort.c:79
#2 0x00007ffff6e9d7b7 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff6fa71ac "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3 0x00007ffff6ea3c0c in malloc_printerr (str=str@entry=0x7ffff6fa53c2 "free(): invalid pointer") at malloc.c:5337
#4 0x00007ffff6ea53a4 in _int_free (av=<optimized out>, p=<optimized out>, have_lock=<optimized out>) at malloc.c:4165
#5 0x00000000004c979f in Perl_sv_clear (my_perl=my_perl@entry=0x7e0260, orig_sv=orig_sv@entry=0x7dd0f8 <sv_list+216>) at sv.c:6652
#6 0x00000000004c9a7e in Perl_sv_free2 (my_perl=0x7e0260, sv=0x7dd0f8 <sv_list+216>, rc=<optimized out>) at sv.c:6954
#7 0x000000000046f6fd in S_SvREFCNT_dec (sv=<optimized out>, my_perl=0x7e0260) at inline.h:166
#8 Perl_pad_swipe (my_perl=my_perl@entry=0x7e0260, po=3, refadjust=refadjust@entry=true) at pad.c:1575
#9 0x000000000041f745 in Perl_op_clear (my_perl=my_perl@entry=0x7e0260, o=o@entry=0x7dc9b0 <svop_list+48>) at op.c:922
#10 0x000000000041fbb3 in Perl_op_free (my_perl=my_perl@entry=0x7e0260, o=<optimized out>) at op.c:794
#11 0x000000000041fd3d in Perl_op_free (my_perl=my_perl@entry=0x7e0260, o=<optimized out>) at op.c:777
#12 0x000000000043df21 in perl_destruct (my_perl=0x7e0260) at perl.c:792
#13 0x000000000041ccdd in main (argc=1, argv=0x7fffffffc508, env=0x7fffffffc518) at a.out.c:758
Using valgrind, we can get a bit more information:
$ valgrind ./a.out
...
==15581== Invalid free() / delete / delete[] / realloc()
==15581== at 0x4C3003C: free (vg_replace_malloc.c:540)
==15581== by 0x4C979E: Perl_sv_clear (sv.c:6652)
==15581== by 0x4C9A7D: Perl_sv_free2 (sv.c:6954)
==15581== by 0x46F6FC: S_SvREFCNT_dec (inline.h:166)
==15581== by 0x46F6FC: Perl_pad_swipe (pad.c:1575)
==15581== by 0x41F744: Perl_op_clear (op.c:922)
==15581== by 0x41FBB2: Perl_op_free (op.c:794)
==15581== by 0x41FD3C: Perl_op_free (op.c:777)
==15581== by 0x43DF20: perl_destruct (perl.c:792)
==15581== by 0x41CCDC: main (a.out.c:758)
==15581== Address 0x7dd370 is 0 bytes inside data symbol "pv2"
...
Seems like some static SVs allocated inside the global C array variable sv_list
had an sv_refcnt
of 1, and here it reaches 0 and perl_destruct()
tries to free the SVs (along with their associated PV memory).
I tried to increment such static SVs with 1 sv_refcnt
in the initialization process with the following pathc for B::C:
diff --git a/lib/B/C.pm b/lib/B/C.pm
index ee1f8f72..3fa565c6 100644
--- a/lib/B/C.pm
+++ b/lib/B/C.pm
@@ -7957,6 +7957,20 @@ EOT
# XXX maybe we need dl_init for a module, esp. when it's XS loading.
print " dl_init(aTHX);\n";
print " perl_init2(aTHX);\n" if $init2->index >= 0;
+
+ print <<'EOT';
+#if 1
+{
+ /* added by OpenResty Inc. */
+ unsigned i = 0;
+ for (i = 0; i < sizeof(sv_list) / sizeof(sv_list[0]); i++) {
+ if (sv_list[i].sv_refcnt == 1) {
+ SvREFCNT_inc_simple_NN(&sv_list[i]);
+ }
+ }
+}
+#endif
+EOT
}
sub output_local_destruct {
The good news is that the reproducer no longer crashes. The bad news is that some DESTROY tests in B::C are now failing:
===( 271;3 4/50 3/50 3/106 2/3 7/87 2/4 2/3 0/3... )===
# Failed test 'missing package DESTROY \#197'
# at t/TestBC.pm line 1298.
===( 277;4 5/50 3/50 3/106 2/3 8/87 2/4 2/3 0/3... )===
# Failed test 'missing -O3 package DESTROY \#197, \#280'
# at t/TestBC.pm line 1298.
t/cc_last.t ....... ok
===( 283;5 5/50 4/50 4/106 8/87 2/4 3/3 0/3 0/2... )===
# Failed test '\#208 missing DESTROY call at DESTRUCT time'
# at t/TestBC.pm line 1298.
# Failed test '\#208 -ffast-destruct'
# at t/TestBC.pm line 1298.
# Looks like you failed 2 tests of 2.
t/issue208.t ...... Dubious, test returned 2 (wstat 512, 0x200)
Failed 2/2 subtests
===( 284;5 5/50 4/50 4/106 8/87 2/4 3/3 0/3 0/2... )===
# Failed test 'missing our DESTROY \#208'
# at t/TestBC.pm line 1298.
t/issue143.t ...... ok
===( 289;5 6/50 4/50 4/106 9/87 3/4 0/3 0/2 0/1... )===
# Failed test 'missing our -O3 DESTROY \#208, \#280'
# at t/TestBC.pm line 1298.
t/issue229.t ...... skipped: Only for author
t/issue212.t ...... ok
===( 299;6 8/50 5/50 5/106 10/87 4/4 0/3 0/2 0/1... )===# Looks like you failed 4 tests of 5.
t/issue197.t ...... Dubious, test returned 4 (wstat 1024, 0x400)
Failed 4/5 subtests
...
Test Summary Report
-------------------
t/issue208.t (Wstat: 512 Tests: 2 Failed: 2)
Failed tests: 1-2
Non-zero exit status: 2
t/issue197.t (Wstat: 1024 Tests: 5 Failed: 4)
Failed tests: 1-4
Non-zero exit status: 4
t/issue306.t (Wstat: 0 Tests: 4 Failed: 0)
TODO passed: 1-4
Given these new test regressions, I guess my fix is not correct after all.
Any suggestions on a proper fix will be highly appreciated!
Thanks in advance!
BTW, the single-threaded build of perl 5.24.4 works fine.
OK, I finally found a way to make all the tests that were already passing on my side still pass after applying my patch:
diff --git a/lib/B/C.pm b/lib/B/C.pm
index ee1f8f72..5d8f07fc 100644
--- a/lib/B/C.pm
+++ b/lib/B/C.pm
@@ -7948,6 +7948,44 @@ EOT1
}
sub output_init {
+ print <<'EOT';
+#if defined(USE_ITHREADS)
+ {
+ /* added by OpenResty Inc. */
+ unsigned i = 0;
+ for (i = 0; i < sizeof(sv_list) / sizeof(sv_list[0]); i++) {
+ SV *sv = &sv_list[i];
+ if (SvREFCNT(sv) == 1) {
+ char *pvx;
+ XPV *xpv;
+ size_t len;
+
+ pvx = SvPVX_mutable(sv);
+ if (!pvx) {
+ continue;
+ }
+ xpv = (XPV *) SvANY(sv);
+ if (!xpv
+ || xpv < xpv_list
+ || (char *) xpv + sizeof(XPV) > (char *) xpv_list + sizeof(xpv_list))
+ {
+ continue;
+ }
+
+ len = SvLEN(sv);
+ if (len) {
+ char *p;
+ p = malloc(len + 1);
+ assert(p != NULL);
+ memcpy(p, pvx, len + 1);
+ SvPVX(sv) = p;
+ }
+ }
+ }
+ }
+#endif
+EOT
+
print <<'EOT';
/* our special compiled init */
perl_init(aTHX);
Does it look good? Thanks!
The question is which SV exactly does the double free. looks like a string
@rurban Yes, it's a string SV (with a PV part). Gdb shows it's sv_list[9]
, as in the quoted C source generated by perlcc:
Static SV sv_list[28] = {
{ NULL, 28, SVTYPEMASK|0x01000000, {0} }, /* sv_list[0] */
{ &xpvio_list[0], 2, 0x10000f, {0} }, /* sv_list[1] */
{ &xpvav_list[0], 5, 0x40c0000b, {0} }, /* sv_list[2] */
{ &xpvmg_list[0], 1, 0x10404407, {0} }, /* sv_list[3] */
{ &xpvmg_list[1], 1, 0x10404407, {0} }, /* sv_list[4] */
{ &xpvmg_list[2], 1, 0x10404407, {0} }, /* sv_list[5] */
{ &xpvav_list[1], 1, 0x4000000b, {0} }, /* sv_list[6] */
{ &xpvhv_list[0], 1, 0x2000000c, {0} }, /* sv_list[7] */
{ &xpv_list[0], 1, 0x18014403, {.svu_pv=(char*)pv2} }, /* sv_list[8] */
{ &xpv_list[1], 1, 0x18014403, {.svu_pv=(char*)pv2} }, /* sv_list[9] */
...
And sv_list[9]
is indeed referencing pv2
as its pvx
component. It also matches the valgrind output.
My second patch above dynamically allocates the .svu_pv
strings during the init
phase at runtime (for SVs with refcnt of 1 only and also for multi-threaded builds of perl only). This way, those PV strings can survive the free()
call initiated by the perl_destruct()
function. Freeing a static memory piece like pv2
led to the crashes and assertion failures inside glibc's free()
.
It's not pretty but it seems to work fine.
BTW, to reproduce this issue, the -O2
option passed to perlcc
is important while the --staticxs
option is irrelevant. Without -O2
or using other optimization levels like 1 or 3, it won't get crashed.
I'm using a Fedora x86_64 Linux.
So it would better on threads with O2 to skip the static string optimization.
@rurban Thanks for the direction! I just came up with this patch instead:
diff --git a/lib/B/C.pm b/lib/B/C.pm
index ee1f8f72..66aa7c02 100644
--- a/lib/B/C.pm
+++ b/lib/B/C.pm
@@ -9389,6 +9389,11 @@ OPTION:
$B::C::save_data_fh = 1 if $] >= 5.008 and (($] < 5.009004) or $MULTI);
$B::C::destruct = 1 if $] < 5.008 or $^O eq 'MSWin32'; # skip -ffast-destruct there
+ # skip -fcow for threaded perl since it might crash the executable
+ # in perl_destruct().
+ # see https://github.com/rurban/perl-compiler/issues/443
+ $B::C::cow = 0 if $ITHREADS or $MULTI;
+
init_sections();
foreach my $i (@eval_at_startup) {
$init2->add_eval($i);
It works fine for me.
Yes, this is better. Care for a PR?
I've put it into my 536 branch
In master