Optimize core file reading
ivonastojanovic opened this issue · 0 comments
Pystack can print the stack trace of a Python core dump by running pystack core $CORE_FILE
. During the execution of this command, Pystack needs to copy memory from the core file and/or several shared libraries. In order to read something from a file one needs to open a file, change the current read position and then read the data from the file. All of these require system calls and these operations take time. By running the strace -cw -- python3 -m pystack core CORE_FILE
we can get some statistics on how expensive system calls are when we execute pystack
with a core file. In the table below we can see the system calls that are issued when we run pystack
over a core together with the number of calls per system call and how much wall time it takes in total per system call. What we want to do is to reduce the total time we spend in the open
/close
/read
system calls when operating over files:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
15.20 0.066126 55 1188 392 stat
14.50 0.063094 50 1238 45 openat
13.38 0.058219 48 1199 3 close
11.82 0.051429 50 1015 read
9.08 0.039498 45 863 3 lseek
7.63 0.033201 52 632 fstat
6.75 0.029364 57 507 mmap
5.59 0.024325 60 405 munmap
4.24 0.018435 48 380 fcntl
4.20 0.018264 44 413 329 readlink
3.31 0.014414 52 273 173 ioctl
1.47 0.006394 96 66 write
0.94 0.004084 60 68 rt_sigaction
0.53 0.002298 95 24 getdents64
0.45 0.001950 67 29 brk
0.36 0.001555 59 26 mprotect
0.12 0.000508 50 10 lstat
0.09 0.000402 50 8 pread64
0.06 0.000245 244 1 execve
0.04 0.000185 46 4 futex
0.04 0.000173 57 3 sigaltstack
0.03 0.000151 75 2 getcwd
0.03 0.000149 49 3 dup
0.02 0.000102 50 2 1 arch_prctl
0.02 0.000084 83 1 sysinfo
0.02 0.000068 68 1 gettid
0.01 0.000055 55 1 set_robust_list
0.01 0.000053 52 1 getrandom
0.01 0.000052 51 1 1 access
0.01 0.000051 50 1 rt_sigprocmask
0.01 0.000050 50 1 prlimit64
0.01 0.000043 43 1 set_tid_address
------ ----------- ----------- --------- --------- ----------------
100.00 0.435018 8367 947 total
When a program reads data from a file, then a location in that file is accessed and transferred from the hard drive to a buffer in memory. In order to try to reduce these system calls we want to use memory mapped files. The idea of memory mapping is to map a core file directly into a virtual memory pages. One of the reasons we think this is going to be faster is because accessing directly the memory mapped by mmap
doesn't require to go to kernel mode and back (skips the syscall cost). To read more about mmap: https://man7.org/linux/man-pages/man2/mmap.2.html