data-apis/python-record-api

Add profiling for C calls

saulshanabrook opened this issue · 7 comments

Currently, if a library calls another library through their C API we are unable to trace it. This includes calling anything in Cython. This is too bad, because a lot of calls to NumPy are from Cython or C libraries.

One idea on how to achieve this, from talking to @scopatz, was to use lldb's Python API. It is now building on Conda Forge on mac so I can get started exploring this.

FWIW the most calls to numpy in sklearn are in Python, I think Cython might call more directly to BLAS or we're implementing our own logic.

Looking through the skimage codebase, I saw a bunch of that is basically just calling out to the normal NumPy API but in Cython, which we totally miss, like this: https://github.com/scikit-image/scikit-image/blob/f71be82423e73cda4f3026a0eb656614db937bbc/skimage/feature/_cascade.pyx#L581-L598

PEP 578 provides C- and Python- level hooks for this kind of thing. Maybe there could be an opt-in Cython mode for this?

Maybe there could be an opt-in Cython mode for this?

That would help definitely... Would require upstream change to Cython right?

Another thought would be to have Cython build in such a way that it doesn't actually unroll the Python interpreter... For debugging purposes? Not sure how hard this would be.

Cython build in such a way that it doesn't actually unroll the Python interpreter.

I think that would have a severe performance hit, but it is worth exploring these ideas with them.

I think that would have a severe performance hit, but it is worth exploring these ideas with them.

Cool, well that would be nice to explore down the road then. This whole thing is super severe performance hit already! So I wouldn't worry about that for our use case, although of course you would only want to build in this mode for debugging or tracing like this.

What about gathering the data using something like bpftrace / bcc? The PEP 578 audit hook @mattip mentioned is included in the static probes / tracepoints compiled into CPython (search for PyDTrace_AUDIT), so you should be able to get at it with bpftrace's usdt probe on Linux (or DTrace, if you're on a Mac).

The ustack function can be used to get all the user-mode C calls within a process; I think you'd then filter down to look for stacks containing calls to the numpy C API. uprobe / uretprobe probes can instrument specific functions so you can e.g. print out arguments and return values to numpy C API functions.

Additional references: