Binaries can be downloaded via Releases, or Repository -> Tags
A tool to analyze raw strace
data.
Similar to output provided by the -c
flag, but with more detail and capable of handling multiple PIDs.
It can generate the following metrics:
- A summary of the top processes, including total syscalls and time spent
- Statistics on each type of syscall made for a given number of proccesses
- Details of a process, including the statistics, files opened, and related PIDS
- A list of all programs executed in session
- A list of all files opened in session
- A list of the duration spent opening files in a directory and its children
- A list of all read/write calls made in session
- A histogram showing the quantized distribution of execution times for a given syscall
- A
pstree
style graph of traced processes
The parser will attempt to find all sibling threads of traced processes. For a command executed with or after
the start of the trace this will always be accurate. On existing processes some threads may not be found.
The parser checks for futex
calls with a *_PRIVATE
flag that share an uaddr
; if these do not occur the
parser cannot relate the processes.
NOTE: strace
must be run with the at least the -tt -T -f -o <FILE>
flags for
required data to be captured. Including -yyy
will provide file details in the io
subcommand.
WARNING: Because strace
may slow down the target system by up to 10x,
it is not recommended for use in a production environment
except as a last resort. See this article
for more information on the performance impacts of strace
.
I recommend using the following default flags:
strace -fttTyyy -s 1024 -o <FILE>
strace -fttTyyy -s 1024 -o <FILE>
strace flag | man's description |
---|---|
-f |
Trace child processes as they are created by currently traced processes as a result of the fork(2), vfork(2) and clone(2) system calls. |
-t |
Prefix each line of the trace with the wall clock time. If given twice (-tt ), the time printed will include the microseconds. |
-T |
Show the time spent in system calls. This records the time difference between the beginning and the end of each system call. |
-y |
Print paths associated with file descriptor arguments. |
-yy |
Print protocol specific information associated with socket file descriptors. |
-s |
Specify the maximum string size to print (the default is 32). Note that filenames are not considered strings and are always printed in full. |
-o <FILE> |
Write the trace output to the file filename rather than to stderr. |
Aside from these default flags, you will also want to attach strace
to a PID
with the -p
flag.
The PID
to strace
for depends on what you want to trace. For reference, you can use the following values:
trace | PID |
---|---|
Sidekiq | $(pgrep -fd, sidekiq) |
For example:
# Strace Sidekiq
sudo strace -fttTyyy -s 1024 -o /tmp/sidekiq_trace -p $(pgrep -fd, sidekiq)
You'll need the Rust compiler 1.42 or above, which can be obtained at https://rustup.rs/.
Build with cargo build --release
, the binary will be located at target/release/strace-parser
.
strace-parser <INPUT> <SUBCOMMAND> [FLAGS] [OPTIONS]
Args:
<INPUT>
- strace output file to analyze
Subcommands:
exec
- List programs executedfiles
- List files openeddirectories
- List total duration ofopen
andopenat
calls performed in a directory and its childrenhelp
- Print a brief help messageio
- Show details of I/O syscalls:read
,recv
,recvfrom
,recvmsg
,send
,sendmsg
,sendto
, andwrite
list-pids
- List of PIDs and their syscall statspid
- Details of PID(s) including syscalls stats, exec'd process, and slowestopen
callsquantize
- Prints a log₂ scale histogram of the quantized execution times in μsecs for a syscallsummary
- Overview of PIDs in sessiontree
- pstree-style view of traced processes
Note that all subcommands can be arbritrarily abbreviated.
For example, strace-parser trace.txt s
goes to summary, while strace-parser trace.txt fi
goes to files.
By default results are sorted by time the process was active, can be changed with -s, --sort
strace-parser <INPUT> summary [OPTIONS]
Options:
-c, --count <COUNT>
- The number of PIDs to print, defaults to 25-s, --sort <SORT_BY>
- Field to sort results by, defaults to active time. Options:active_time
children
pid
syscalls
total_time
user_time
$ strace-parser trace.txt summary --count 2
Top 2 PIDs by Active Time
-----------
pid actv (ms) wait (ms) total (ms) % of actv syscalls children
------- ---------- ---------- ---------- --------- --------- ---------
18741 374.363 10112.698 10487.062 65.85% 4098 0
17021 67.277 11131.771 11199.049 11.83% 1473 0
PIDs 101
real 1m0.609s
user 0m36.305s
sys 12m17.512s
Print a list of the syscall stats of the top PIDs.
strace-parser <INPUT> list-pids [OPTIONS]
Options:
-c, --count <COUNT>
- The number of PIDs to print, defaults to 25-s, --sort <SORT_BY>
- Field to sort results by, defaults to active time. Options:active_time
children
pid
syscalls
total_time
user_time
$ strace-parser trace.txt list-pids --count 2 --sort syscalls
Details of Top 2 PIDs by Syscall Count
-----------
PID 18741
4098 syscalls, active time: 374.363ms, total time: 10487.062ms
syscall count total (ms) max (ms) avg (ms) min (ms) errors
--------------- -------- ---------- ---------- ---------- ---------- --------
select 42 9893.016 4806.976 235.548 0.012
clock_gettime 2550 208.378 3.959 0.082 0.008
futex 79 206.069 101.633 2.608 0.009 EAGAIN: 3 ETIMEDOUT: 5
PID 17021
1473 syscalls, active time: 67.277ms, total time: 11199.049ms
syscall count total (ms) max (ms) avg (ms) min (ms) errors
--------------- -------- ---------- ---------- ---------- ---------- --------
select 42 11115.210 4814.914 264.648 0.013
clock_gettime 860 20.842 2.273 0.024 0.009
fcntl 121 13.360 7.037 0.110 0.011
Details of PID(s) including syscalls stats, processes executed, sibling threads, exit code, and slowest open
and openat
calls.
strace-parser <INPUT> pid [FLAGS] <PIDS>...
Args:
<PIDS>...
- PID(s) to analyze
Flags:
-r, --related
- Include parent and child PIDs of in results-t, --threads
- Include sibling threads of in results
$ strace-parser trace.txt pid 16747
PID 28912
349 syscalls, active time: 5.746ms, user time: 10.892ms, total time: 66.577ms
start time: 21:16:56.521660 end time: 21:16:56.588237
syscall count total (ms) max (ms) avg (ms) min (ms) errors
----------------- -------- ---------- ---------- ---------- ---------- --------
wait4 2 49.738 49.713 24.869 0.025
fcntl 265 2.447 0.060 0.009 0.008 EBADF: 252
execve 1 1.196 1.196 1.196 1.196
clone 3 0.543 0.325 0.181 0.092
---------------
Program Executed: /bin/sh -c "/opt/gitlab/bin/gitlab-psql -d template1 -c 'SELECT datname FROM pg_database' -A | grep -x gitlabhq_production"
Time: 21:16:56.533040
Exit code: 0
Parent PID: 28898
Threads: 28914
Child PIDs: 28915, 28916
Slowest file open times for PID 28912:
dur (ms) timestamp error file name
---------- --------------- --------------- ---------
0.026 21:16:56.534623 - /etc/ld.so.cache
0.024 21:16:56.534879 - /lib/x86_64-linux-gnu/libc.so.6
0.018 21:16:56.524736 - /proc/self/status
List sums of durations of open
and openat
calls in directories and their child directories.
For example, the row for /
will sum all calls made in /var
and /opt
, as well as all calls
performed in /
itself.
strace-parser <INPUT> directories [FLAGS] [OPTIONS]
Options:
-p, --pid <PIDS>...
- Limit results to one or more PIDs-s, --sort <SORT_BY>
- Field to sort results by, defaults to timestamp. Options:count
duration
pid
time
Flags:
-r, --related
- Include parent and child PIDs of in results-t, --threads
- Include sibling threads of in results
Directories accessed for files
pid dur (ms) first time last time open ct directory name
------- ---------- --------------- --------------- ---------- --------------
1071655 72399.930 02:43:35.288223 02:47:18.785449 1682782 /
1071655 0.075 02:43:35.288223 02:43:35.355015 3 /proc
1071655 0.052 02:43:35.288223 02:43:35.355015 2 /proc/self
1071655 0.149 02:43:35.302629 02:46:54.763037 9 /etc
1071655 0.255 02:43:35.302831 02:46:54.763379 15 /lib64
Print a list of all programs executed in session via execve
strace-parser <INPUT> exec [FLAGS] [OPTIONS]
Options:
-p, --pid <PIDS>...
- Limit results to one or more PIDs
Flags:
-r, --related
- Include parent and child PIDs of in results-t, --threads
- Include sibling threads of in results
$ strace-parser trace.txt exec --pid 28912 --related
Programs Executed
pid exit time program
------ ---- --------------- -------
28898 1 21:16:52.375031 /opt/gitlab/embedded/bin/omnibus-ctl gitlab /opt/gitlab/embedded/service/omnibus-ctl* replicate-geo-database --host=primary.geo.example.com --slot-name=secondary_geo_example_com --backup-timeout=21600
28912 0 21:16:56.533040 /bin/sh -c "/opt/gitlab/bin/gitlab-psql -d template1 -c 'SELECT datname FROM pg_database' -A | grep -x gitlabhq_production"
28915 0 21:16:56.537770 /opt/gitlab/bin/gitlab-psql -d template1 -c "SELECT datname FROM pg_database" -A
28915 0 21:16:56.558860 /opt/gitlab/embedded/bin/chpst -u gitlab-psql -U gitlab-psql /usr/bin/env PGSSLCOMPRESSION=0 /opt/gitlab/embedded/bin/psql -p 5432 -h /var/opt/gitlab/postgresql -d gitlabhq_production -d template1 -c "SELECT datname FROM pg_database" -A
28915 0 21:16:56.564387 /usr/bin/env PGSSLCOMPRESSION=0 /opt/gitlab/embedded/bin/psql -p 5432 -h /var/opt/gitlab/postgresql -d gitlabhq_production -d template1 -c "SELECT datname FROM pg_database" -A
28915 0 21:16:56.566690 /opt/gitlab/embedded/bin/psql -p 5432 -h /var/opt/gitlab/postgresql -d gitlabhq_production -d template1 -c "SELECT datname FROM pg_database" -A
28916 0 21:16:56.538270 /bin/grep -x gitlabhq_production
Processes with non-0 exits or terminated by a signal may indicate an error.
Print a list of all files opened in session via open
and openat
strace-parser <INPUT> files [FLAGS] [OPTIONS]
Options:
-p, --pid <PIDS>...
- Limit results to one or more PIDs-s, --sort <SORT_BY>
- Field to sort results by, defaults to timestamp. Options:duration
pid
time
Flags:
-r, --related
- Include parent and child PIDs of in results-t, --threads
- Include sibling threads of in results
$ strace-parser trace.txt files --pid 2913
Files Opened
pid open (ms) timestamp error file name
------- ------------ --------------- --------------- ---------
2913 0.553 11:35:02.902746 - /dev/null
2913 0.355 11:35:11.658594 - /proc/stat
Print details of all read
, write
, recv
, recvfrom
, recvmsg
, send
, sendto
, and sendmsg
calls in session
strace-parser <INPUT> io [FLAGS] [OPTIONS]
Options:
-p, --pid <PIDS>...
- Limit results to one or more PIDs-s, --sort <SORT_BY>
- Field to sort results by, defaults to timestamp. Options:duration
pid
time
Flags:
-r, --related
- Include parent and child PIDs of in results-t, --threads
- Include sibling threads of in results
I/O Performed
pid dur (ms) timestamp syscall bytes error file name
------- ---------- --------------- -------- -------- --------------- ---------
20212 0.076 11:26:27.294812 read 0 - pipe:[2645699502]
20212 0.080 11:26:27.295020 read 0 - pipe:[2645699502]
20212 0.108 11:26:27.295187 read 0 - pipe:[2645699502]
20212 0.170 11:26:27.392784 write 30 - UNIX:[2645216608->2645215442]
Prints a log₂ scale histogram of the quantized execution times in μsecs for a given syscall.
strace-parser <INPUT> quantize [FLAGS] [OPTIONS] <SYSCALL>
Args:
<SYSCALL>
- Syscall to analyze
Options:
-p, --pid <PIDS>...
- Limit results to one or more PIDs
Flags:
-r, --related
- Include parent and child PIDs of in results-t, --threads
- Include sibling threads of in results
$ strace-parser trace.txt quantize write --pid 2993 28861 --related
syscall: write
pids: 28861 27191 2993 27758 27569 28136 27947 28514 27222 27411 and 17 more...
μsecs count distribution
------------ -------- ----------------------------------------
16 -> 31 10 | |
32 -> 63 206 |▇ |
64 -> 127 992 |▇▇▇▇▇▇▇ |
128 -> 255 3668 |▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ |
256 -> 511 5171 |▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇|
512 -> 1K 765 |▇▇▇▇▇ |
1K -> 2K 95 | |
2K -> 4K 32 | |
4K -> 8K 14 | |
8K -> 16K 2 | |
16K -> 32K 0 | |
32K -> 64K 1 | |
Print a pstree
style graph of PIDs and their children. Sibling threads are surrounded by curly brackets.
Note: For processes already running when the trace began, the parser will attempt to guess thread
relationships using the addresses of private futex
calls. Spurious relationships may occur rarely when
separate processes happen to use the same address for a futex
. This method is not used for any processes
forked/cloned during the trace. The relationships of these will always be accurate.
strace-parser <INPUT> tree [FLAGS]
Flags:
-t, --truncate
- Truncate commands to 50 characters to prevent line wrapping
$ strace-parser trace.txt tree --truncate
28897 - exit: 1, cmd: /usr/bin/gitlab-ctl replicate-geo-database --host=...
└─28898 - exit: 1, cmd: /opt/gitlab/embedded/bin/omnibus-ctl gitlab /opt/g...
├─{28899}
├─{28906}
├─{28913}
├─{28920}
├─{28927}
├─28905 - exit: 0, cmd: /usr/bin/locale -a
│ └─{28907}
├─28912 - exit: 0, cmd: /bin/sh -c "/opt/gitlab/bin/gitlab-psql -d templat...
│ ├─{28914}
│ ├─28915 - exit: 0, cmd: /opt/gitlab/embedded/bin/psql -p 5432 -h /var/opt/...
│ │ └─28917 - exit: 0, cmd: /usr/bin/id -n -u
│ └─28916 - exit: 0, cmd: /bin/grep -x gitlabhq_production
├─28919 - exit: 0, cmd: /bin/sh -c "/opt/gitlab/bin/gitlab-psql -d gitlabh...
│ ├─{28921}
│ ├─28922 - exit: 0, cmd: /opt/gitlab/embedded/bin/psql -p 5432 -h /var/opt/...
│ │ └─28924 - exit: 0, cmd: /usr/bin/id -n -u
│ └─28923 - exit: 0, cmd: /bin/grep -x projects
└─28926 - exit: 0, cmd: /bin/sh -c "/opt/gitlab/bin/gitlab-psql -d gitlabh...
├─{28928}
└─28929 - exit: 0, cmd: /opt/gitlab/embedded/bin/psql -p 5432 -h /var/opt/...
└─28930 - exit: 0, cmd: /usr/bin/id -n -u
strace
will significantly slow down syscalls execution, so do not consider the times listed
as accurate when comparing to how a program performs normally.
That said, it is very useful for understanding what calls are made and their relative cost.
$ strace-parser trace1.txt pid 97266
PID 97266
303 syscalls, active time: 112.503ms, total time: 112.503ms
syscall count total (ms) max (ms) avg (ms) min (ms) errors
----------------- -------- ---------- ---------- ---------- ---------- --------
read 30 43.257 37.403 1.442 0.011 ERESTARTSYS: 1
close 24 27.824 12.166 1.159 0.012
open 29 19.730 5.053 0.680 0.013 ENOENT: 7
access 21 6.892 0.988 0.328 0.013 ENOENT: 13
lstat 5 2.974 1.142 0.595 0.019 ENOENT: 4
stat 15 2.862 0.575 0.191 0.013 ENOENT: 3
openat 2 2.486 1.872 1.243 0.614
getdents 4 2.150 1.796 0.538 0.011
rt_sigaction 59 1.111 0.121 0.019 0.010
mmap 33 0.838 0.066 0.025 0.012
fstat 29 0.670 0.116 0.023 0.011
mprotect 14 0.432 0.100 0.031 0.014
munmap 8 0.212 0.089 0.026 0.014
write 6 0.181 0.036 0.030 0.023
execve 1 0.175 0.175 0.175 0.175
brk 4 0.157 0.085 0.039 0.015
rt_sigprocmask 3 0.130 0.084 0.043 0.018
lseek 2 0.089 0.074 0.045 0.015
fcntl 4 0.083 0.032 0.021 0.015
set_robust_list 1 0.065 0.065 0.065 0.065
dup2 3 0.061 0.035 0.020 0.011
getrlimit 2 0.044 0.027 0.022 0.017
arch_prctl 1 0.026 0.026 0.026 0.026
set_tid_address 1 0.022 0.022 0.022 0.022
getcwd 1 0.020 0.020 0.020 0.020
setpgid 1 0.012 0.012 0.012 0.012
---------------
Program Executed: /opt/gitlab/embedded/bin/git
Args: ["--git-dir" "/gitlab-data/git-data/repositories/group_name/project.git" "cat-file" "--batch"]
Parent PID: 118534
Slowest file open times for PID 97266:
dur (ms) timestamp error file name
---------- --------------- --------------- ---------
5.053 08:42:44.933863 - /gitlab-data/git-data/repositories/group_name/project.git/config
3.115 08:42:44.925575 - /gitlab-data/git-data/repositories/group_name/project.git/config
2.653 08:42:44.898632 - /gitlab-data/git-data/repositories/group_name/project.git/config
2.239 08:42:44.916090 - /gitlab-data/home/.gitconfig
1.972 08:42:44.983603 - /gitlab-data/git-data/repositories/group_name/project.git/packed-refs
1.921 08:42:44.891691 - /gitlab-data/git-data/repositories/group_name/project.git/HEAD
1.872 08:42:44.981307 - /gitlab-data/git-data/repositories/group_name/project.git/refs/
1.831 08:42:44.930141 - /gitlab-data/home/.gitconfig
0.614 08:42:44.987032 - /gitlab-data/git-data/repositories/group_name/project.git/objects/pack
0.342 08:42:44.997547 ENOENT /gitlab-data/git-data/repositories/group_name/project.git/objects/info/alternates
$ strace-parser trace2.txt pid 64205
PID 64205
243 syscalls, active time: 120.542ms, total time: 120.542ms
syscall count total (ms) max (ms) avg (ms) min (ms) errors
----------------- -------- ---------- ---------- ---------- ---------- --------
rt_sigaction 59 21.174 0.908 0.359 0.110
open 23 13.623 1.464 0.592 0.200 ENOENT: 6
mmap 27 12.203 1.116 0.452 0.163
read 23 11.897 0.998 0.517 0.221
fstat 24 11.276 0.770 0.470 0.155
close 19 11.176 1.024 0.588 0.234
access 11 8.197 1.492 0.745 0.247 ENOENT: 3
lstat 5 6.525 2.624 1.305 0.627 ENOENT: 4
mprotect 14 4.998 0.798 0.357 0.208
munmap 8 3.646 0.820 0.456 0.212
stat 7 2.997 0.629 0.428 0.155 ENOENT: 3
getdents 4 2.870 0.865 0.717 0.538
brk 4 1.784 0.597 0.446 0.351
openat 2 1.735 0.918 0.867 0.817
dup2 3 1.451 0.652 0.484 0.196
execve 1 1.301 1.301 1.301 1.301
rt_sigprocmask 3 1.223 0.488 0.408 0.266
getcwd 1 0.557 0.557 0.557 0.557
set_robust_list 1 0.466 0.466 0.466 0.466
set_tid_address 1 0.452 0.452 0.452 0.452
getrlimit 1 0.419 0.419 0.419 0.419
setpgid 1 0.382 0.382 0.382 0.382
arch_prctl 1 0.190 0.190 0.190 0.190
exit_group 1 n/a n/a n/a n/a
---------------
Program Executed: /opt/gitlab/embedded/bin/git
Args: ["--git-dir" "/var/opt/gitlab-data/git-data/repositories/group_name/project.git" "for-each-ref" "--format=%(refname)" "refs/tags"]
Parent PID: 30154
Slowest file open times for PID 64205:
dur (ms) timestamp error file name
---------- --------------- --------------- ---------
1.464 14:38:21.803101 - /var/opt/gitlab-data/git-data/repositories/group_name/project.git/packed-refs
1.237 14:38:21.749569 - /var/opt/gitlab-data/git-data/repositories/group_name/project.git/config
1.166 14:38:21.711738 - /var/opt/gitlab-data/git-data/repositories/group_name/project.git/config
0.979 14:38:21.736315 - /var/opt/gitlab-data/home/.gitconfig
0.938 14:38:21.702572 - /var/opt/gitlab-data/git-data/repositories/group_name/project.git/HEAD
0.918 14:38:21.788967 - /var/opt/gitlab-data/git-data/repositories/group_name/project.git/refs/
0.836 14:38:21.664860 - /lib64/librt.so.1
0.817 14:38:21.796232 - /var/opt/gitlab-data/git-data/repositories/group_name/project.git/refs/tags/
0.739 14:38:21.775371 - /var/opt/gitlab-data/home/.gitconfig
0.598 14:38:21.663682 ENOENT /opt/gitlab/embedded/lib/librt.so.1
Here's a comparison of two git processes on different infrastructure.
- Example 1 spends ~90 out of 112ms on I/O operations
read
,open
, andclose
. - Example 2 is not bound by a specific type of action, but syscalls in general are slower.
- 59 calls to
rt_sigaction
take 1ms for example 1, but 21ms for example 2. - 33 calls to
mmap
take < 1ms for example 1, but 27 calls to it take 12ms for example 2. - Most other syscalls are significantly slower, with the exceptions of calls that rely on the filesystem.
- 59 calls to
Based on this, we can guess that example 1 is bottlenecked by block I/O, while example 2 is bound by general system performance - perhaps high load.
- PID reuse is not handled; separate processes with the same PID will be tracked as a single entity.
- Existing thread relationships are not removed when from a process when it executes
execve
.
strace-parser <input file> files | grep -v ENOENT | awk '{print $5}' | uniq
- Find all file paths that were opened without an ENOENT
error.