Unwinding cores that stack overflowed can take forever
pablogsal opened this issue · 3 comments
This program:
max_iters = 1000000
i = filter(bool, range(max_iters))
for _ in range(max_iters):
i = filter(bool, i)
del i
Can generate a core that takes forever to unwind in some machines with gigantic stack sizes. For example, this can generate 58854+ frames. We may want to limit the number of frames we unwind. This also may be some kind of bug where we don't advance the IP when unwinding. Maybe we need to investigate a bit more.
Example of using eu-stack
from elfutils:
...
#82404 0x00007f0888f7c93c filter_dealloc
#82405 0x00007f0888f7c93c filter_dealloc
#82406 0x00007f0888f7c93c filter_dealloc
#82407 0x00007f0888f7c93c filter_dealloc
#82408 0x00007f0888f7c93c filter_dealloc
#82409 0x00007f0888f7c93c filter_dealloc
#82410 0x00007f0888f7c93c filter_dealloc
#82411 0x00007f0888f7c93c filter_dealloc
#82412 0x00007f0888f7c93c filter_dealloc
#82413 0x00007f0888f7c93c filter_dealloc
#82414 0x00007f0888f7c93c filter_dealloc
#82415 0x00007f0888f7c93c filter_dealloc
#82416 0x00007f0888f7c93c filter_dealloc
#82417 0x00007f0888f7c93c filter_dealloc
#82418 0x00007f0888f7c93c filter_dealloc
#82419 0x00007f0888f7c93c filter_dealloc
#82420 0x00007f0888f7c93c filter_dealloc
#82421 0x00007f0888f7c93c filter_dealloc
#82422 0x00007f0888f7c93c filter_dealloc
#82423 0x00007f0888f7c93c filter_dealloc
#82424 0x00007f0888f7c93c filter_dealloc
#82425 0x00007f0888f7c93c filter_dealloc
#82426 0x00007f0888f7c93c filter_dealloc
#82427 0x00007f0888f7c93c filter_dealloc
#82428 0x00007f0888f7c93c filter_dealloc
#82429 0x00007f0888f7c93c filter_dealloc
#82430 0x00007f0888f7c93c filter_dealloc
#82431 0x00007f0888f7c93c filter_dealloc
#82432 0x00007f0888f7c93c filter_dealloc
#82433 0x00007f0888f7c93c filter_dealloc
#82434 0x00007f0888f7c93c filter_dealloc
#82435 0x00007f0888f7c93c filter_dealloc
#82436 0x00007f0888f7c93c filter_dealloc
#82437 0x00007f0888f7c93c filter_dealloc
#82438 0x00007f0888f7c93c filter_dealloc
#82439 0x00007f0888f7c93c filter_dealloc
#82440 0x00007f0888f7c93c filter_dealloc
#82441 0x00007f0888f7c93c filter_dealloc
#82442 0x00007f0888f7c93c filter_dealloc
#82443 0x00007f0888f7c93c filter_dealloc
#82444 0x00007f0888f7c93c filter_dealloc
...
Well, gdb on WSL successfully unwinds it. It ends at #1047054
for Python 3.10.10, given a stack size limit of 8 MiB. So 1 million 8 byte frames. 😬
I let pystack
run, and it finished in a mere 21.5 minutes. A huge portion of the time was spent in dwfl_module_addrinfo
(judging by periodically pystack'ing the running pystack to see what it was up to). Looks like adding a PC->symbol name cache would help tremendously.
Adding a cache around dwfl_module_addrinfo
lets it complete in 30 seconds instead of 1290 seconds.