bloomberg/pystack

Optimize performance with caching

ivonastojanovic opened this issue · 0 comments

Pystack calls a lot of sys calls where some of which are very expensive and that is what makes Pystack slower. One of the most expensive sys calls is copying the memory from a remote process to the local process. Each time a local process requires some portion of memory from a remote process it calls a sys call to copy memory. By running the strace -c -- python3 -m pystack remote PID --locals which report some statistics on the program it has traced. In the picture below the process_vm_readv syscall takes a lot of time.

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 35.36    0.005828           2      2390           process_vm_readv
 12.80    0.002109           2       716        92 stat
 10.55    0.001738           2       751           read
  8.53    0.001405           3       360        31 open
  7.32    0.001206         603         2           wait4
  3.57    0.000589           3       171           mmap
  3.18    0.000524           0       570           fstat
  2.79    0.000460         460         1           clone
  2.74    0.000451           1       356           close
  2.40    0.000395           0       518         2 lseek
  2.35    0.000387          19        20           munmap
  1.76    0.000290           0       453       447 ioctl
  1.66    0.000274           4        57           mprotect
  1.31    0.000216          11        19           openat
  0.91    0.000150           0       236           write
  0.64    0.000106           2        36           getdents
  0.39    0.000064           0        74           brk
  0.34    0.000056           5        10        10 access
  0.30    0.000049          12         4         1 connect
  0.22    0.000036           4         9         1 readlink
  0.14    0.000023           3         7           poll
  0.14    0.000023           5         4           socket
  0.10    0.000017           8         2           ptrace
  0.10    0.000017           2         8           futex
  0.09    0.000015           3         5           sendto
  0.04    0.000007           1         6           fcntl
  0.04    0.000006           3         2           recvmsg
  0.03    0.000005           0        68           rt_sigaction
  0.03    0.000005           5         1           execve
  0.03    0.000005           5         1           epoll_create1
  0.02    0.000004           1         3           dup
  0.02    0.000004           1         3           getuid
  0.01    0.000002           2         1           rt_sigprocmask
  0.01    0.000002           2         1           getrlimit
  0.01    0.000002           2         1           getgid
  0.01    0.000002           2         1           geteuid
  0.01    0.000002           2         1           getegid
  0.01    0.000002           2         1           arch_prctl
  0.01    0.000002           2         1           set_tid_address
  0.01    0.000002           2         1           set_robust_list
  0.00    0.000000           0        10           lstat
  0.00    0.000000           0         2           pread64
  0.00    0.000000           0         1           recvfrom
  0.00    0.000000           0         1           setsockopt
  0.00    0.000000           0         1           getsockopt
  0.00    0.000000           0         1           gettid
------ ----------- ----------- --------- --------- ----------------
100.00    0.016480                  6887       584 total

Adding cache should be a good solution to reduce the number of these sys calls which will make Pystack faster. Firstly, Pystack will try to find information about some portion of process memory in a cache, and if there is no information then sys call for copying the process memory is called. This cache will be used when analyzing a remote process and a core file.