Add option to profile child processes
jamespic opened this issue · 4 comments
I've hit a few use cases where it would be handy to profile a process and its children (multiprocessing, forking web servers, PySpark). Would there be interest in a patch implementing this?
When the python process spawns another 10 processes, then it's hard to profile them.
If there is a patch I will merge it. I think it should be optional via a flag, analogous to how strace -ff
works.
What needs to be done:
- Call
ptrace()
withPTRACE_O_TRACEFORK
to interceptfork()
calls. - The output filenames need to have a pid appended to them, like with
strace -ff
I looked at this a little bit, and I'm confused at how to use the POSIX APIs correctly.
Right now there's a loop in main()
that basically attaches, gets the stack data, detaches, and sleeps for a bit. Once the sleep finishes, it reattaches, gets the stack data, detaches, and so on. If you look for the code in src/pyflame.cc
that looks like this you'll see what I mean:
PtraceDeatch(pid);
std::this_thread::sleep_for(interval);
PtraceAttach(pid);
The PTRACE_O_TRACEFORK
flag is supposed to be used in conjunction with waitpid()
. The idea is you'd call it, and then the wstatus
integer you get back has some bits set that let you detect that a fork happened, and then you use PTRACE_GETEVENTMSG
to actually figure out the new PID to trace.
On Linux there's a method called sigtimedwait()
that can be used to wait for a signal with a timeout. It gives you back a siginfo_t
, so it's more like waitid()
than waitpid()
. But that's fine, they should be equivalent. The idea is src/pyflame.cc
should be updated to use sigtimedwait()
, so that it sleeps for the equivalent amount of time as the existing thread sleep call, but also gets notified when there's a new process to trace.
I did this, set up my mask to get SIGCHLD
, and sure enough I get notified. However, I don't see how to actually get the information I want. The docs for PTRACE_O_TRACEFORK
say that after the waitpid()
call you should check that status
satisfies the property:
status>>8 == (SIGTRAP | (PTRACE_EVENT_FORK<<8))
We have a siginfo_t
not a status integer, but it looks like siginfo_t
is supposed to have a field called si_status
that I'm guessing is equivalent. However, for some reason, GDB can't field this field:
ptype siginfo_t
type = struct siginfo_t {
int si_signo;
int si_errno;
int si_code;
union {
int _pad[28];
struct {...} _kill;
struct {...} _timer;
struct {...} _rt;
struct {...} _sigchld;
struct {...} _sigfault;
struct {...} _sigpoll;
struct {...} _sigsys;
} _sifields;
}
This does not match the definition from the sigaction(2)
man page at all. I think GDB is just confused. Having GDB unavailable is really annoying, since GDB makes it easier to introspect objects. GCC knows about the field, but it doesn't seem to have what I want. I also tried some variations like immediately calling waitpid()
with WNOHANG
after the sigtimedwait()
call, but that didn't work right. I also tried adding SIGTRAP
to my sigtimedwait()
mask but that doesn't seem to do anything.
I pushed a branch called trace_children
that has a bunch of this code in it set up in a really hacky not-at-all-working way. However, I was trying various different permutations of these system calls so I'm not sure exactly what the state of the branch is. In the meantime, if anyone knows of another program that uses sigtimedwait()
in conjunction with PTRACE_O_TRACEFORK
that would be really helpful.
I fixed a bunch of bugs in this branch, and I got far enough to get a waitpid()
version to actually do the right thing. However, it looks like sigtimedwait()
does not work with PTRACE_O_TRACEFORK
.
I looked at the Linux kernel code, and the ptrace event is normally put into the si_code
field of the siginfo_t
. However, the si_code
field when using sigtimedwait()
with PTRACE_O_TRACEFORK
always has si_code
set to CLD_TRAPPED
. Furthermore, sigtimedwait()
marks the signal as delivered, so it's not possible to immediately call waitpid()
after the sigtimedwait()
call. This is arguably a bug in the kernel, although fixing this correctly probably requires extending the siginfo_t
type or something.
Fortunately there are a lot of alternatives: I can play around with signalfd()
, and if worst comes to worse we can add multi-threading code to Pyflame (so one thread blocks on waitpid()
and another thread does normal sleep calls).