`GC_collect` tries to realloc (oops)
Closed this issue · 4 comments
Here is an interesting stacktrace I just got:
#0 []= () at /usr/share/crystal/src/pointer.cr:130
#1 push () at /usr/share/crystal/src/deque.cr:480
#2 0x000055555558a3af in lock_slow () at /usr/share/crystal/src/mutex.cr:70
#3 0x000055555558d882 in lock () at /usr/share/crystal/src/mutex.cr:51
#4 collect () at /usr/share/crystal/src/mutex.cr:132
#5 0x0000555555591e5c in GlobalAllocator_tryCollect (self=0x5555557a2290) at src/global_allocator.c:151
#6 GC_GlobalAllocator_nextBlock (self=0x5555557a2290) at src/global_allocator.c:175
#7 0x0000555555592a3d in LocalAllocator_initCursor (self=0x5555557a2530) at src/local_allocator.c:10
#8 GC_LocalAllocator_allocateSmall (self=0x5555557a2530, size=32, atomic=0) at src/local_allocator.c:135
#9 0x000055555556328c in malloc () at /home/github/immix.cr/src/immix.cr:23
#10 __crystal_malloc64 () at /usr/share/crystal/src/gc.cr:24
#11 malloc () at /usr/share/crystal/src/primitives.cr:164
#12 increase_capacity () at /usr/share/crystal/src/deque.cr:587
#13 push () at /usr/share/crystal/src/deque.cr:477
#14 0x000055555558a3af in lock_slow () at /usr/share/crystal/src/mutex.cr:70
#15 0x000055555558d882 in lock () at /usr/share/crystal/src/mutex.cr:51
#16 collect () at /usr/share/crystal/src/mutex.cr:132
#17 0x0000555555591e5c in GlobalAllocator_tryCollect (self=0x5555557a2290) at src/global_allocator.c:151
#18 GC_GlobalAllocator_nextBlock (self=0x5555557a2290) at src/global_allocator.c:175
#19 0x0000555555592a3d in LocalAllocator_initCursor (self=0x5555557a2530) at src/local_allocator.c:10
#20 GC_LocalAllocator_allocateSmall (self=0x5555557a2530, size=24, atomic=0) at src/local_allocator.c:135
#21 0x000055555556f4b2 in malloc () at /home/github/immix.cr/src/immix.cr:23
#22 __crystal_malloc64 () at /usr/share/crystal/src/gc.cr:24
#23 new () at /usr/share/crystal/src/deque.cr:25
#24 wait_readable () at /usr/share/crystal/src/io/evented.cr:125
#25 wait_readable () at /usr/share/crystal/src/io/evented.cr:119
#26 unbuffered_read () at /usr/share/crystal/src/io/evented.cr:59
#27 0x000055555556f349 in fill_buffer () at /usr/share/crystal/src/io/buffered.cr:261
#28 read () at /usr/share/crystal/src/io/buffered.cr:82
#29 0x00005555555700e0 in read_fully? () at /usr/share/crystal/src/io.cr:537
#30 read_fully () at /usr/share/crystal/src/io.cr:520
#31 0x0000555555587aa5 in decode () at /usr/share/crystal/src/io/byte_format.cr:123
#32 from_io () at /usr/share/crystal/src/int.cr:759
#33 read_bytes () at /usr/share/crystal/src/io.cr:913
#34 -> () at /usr/share/crystal/src/signal.cr:244
#35 0x00005555555865e9 in run () at /usr/share/crystal/src/fiber.cr:146
#36 0x0000000000000000 in ?? ()
We can see that we try to allocate (#23: Deque.new
) but the HEAP has been filled, so the GC starts a collection, and calls GC_collect
to collect all the stacks to walk, but... the Deque is too small and we try to increase it's capacity (#12
), which tries to allocate but fails because the HEAP is filled, and so we try to collect (oops).
AFAIK: the problem is relying on the Crystal stdlib (Mutex
) when we should rely on libc instead (pthread_mutex
). The former relies on the GC when the latter won't.
Using It doesn't even work, and crashes with an infinite recursion.Thread::Mutex
may be fine as a quick workaround, but it won't do in the long run, because it raises on errors which will allocate an exception class.
I tried to call GC_lock
and GC_unlock
but I get a reentrant/concurrent call into GC_collect
:
GC.collect: GC_lock (fiber=main)
GC.collect: GC_lock (fiber=Signal Loop)
GC.collect: GC_unlock (fiber=Signal Loop)
<hangs forever>
It then hangs forever because the second GC.collect
overwrote @@pending
so the main fiber won't be resumed ever again.
It looks like an Atomic::Flag can workaround the issue.