Optimize performance with caching
ivonastojanovic opened this issue · 0 comments
Pystack calls a lot of sys calls where some of which are very expensive and that is what makes Pystack slower. One of the most expensive sys calls is copying the memory from a remote process to the local process. Each time a local process requires some portion of memory from a remote process it calls a sys call to copy memory. By running the strace -c -- python3 -m pystack remote PID --locals
which report some statistics on the program it has traced. In the picture below the process_vm_readv syscall takes a lot of time.
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
35.36 0.005828 2 2390 process_vm_readv
12.80 0.002109 2 716 92 stat
10.55 0.001738 2 751 read
8.53 0.001405 3 360 31 open
7.32 0.001206 603 2 wait4
3.57 0.000589 3 171 mmap
3.18 0.000524 0 570 fstat
2.79 0.000460 460 1 clone
2.74 0.000451 1 356 close
2.40 0.000395 0 518 2 lseek
2.35 0.000387 19 20 munmap
1.76 0.000290 0 453 447 ioctl
1.66 0.000274 4 57 mprotect
1.31 0.000216 11 19 openat
0.91 0.000150 0 236 write
0.64 0.000106 2 36 getdents
0.39 0.000064 0 74 brk
0.34 0.000056 5 10 10 access
0.30 0.000049 12 4 1 connect
0.22 0.000036 4 9 1 readlink
0.14 0.000023 3 7 poll
0.14 0.000023 5 4 socket
0.10 0.000017 8 2 ptrace
0.10 0.000017 2 8 futex
0.09 0.000015 3 5 sendto
0.04 0.000007 1 6 fcntl
0.04 0.000006 3 2 recvmsg
0.03 0.000005 0 68 rt_sigaction
0.03 0.000005 5 1 execve
0.03 0.000005 5 1 epoll_create1
0.02 0.000004 1 3 dup
0.02 0.000004 1 3 getuid
0.01 0.000002 2 1 rt_sigprocmask
0.01 0.000002 2 1 getrlimit
0.01 0.000002 2 1 getgid
0.01 0.000002 2 1 geteuid
0.01 0.000002 2 1 getegid
0.01 0.000002 2 1 arch_prctl
0.01 0.000002 2 1 set_tid_address
0.01 0.000002 2 1 set_robust_list
0.00 0.000000 0 10 lstat
0.00 0.000000 0 2 pread64
0.00 0.000000 0 1 recvfrom
0.00 0.000000 0 1 setsockopt
0.00 0.000000 0 1 getsockopt
0.00 0.000000 0 1 gettid
------ ----------- ----------- --------- --------- ----------------
100.00 0.016480 6887 584 total
Adding cache should be a good solution to reduce the number of these sys calls which will make Pystack faster. Firstly, Pystack will try to find information about some portion of process memory in a cache, and if there is no information then sys call for copying the process memory is called. This cache will be used when analyzing a remote process and a core file.