python/cpython

Add a new interface for external profilers and debuggers to get the call stack efficiently

pablogsal opened this issue · 0 comments

Currently, external profilers and debuggers need to issue one system call per frame object when they are retrieving the call stack from a remote process as these form a linked list. Furthermore, for every frame object they need at least two or three more system calls to retrieve the code object and the code object name and file. This has several disadvantages:

  • For sampling profilers, the status of the runtime can change dramatically as all the data is copied, so this can lead to inaccurate information or higher chances of corrupted memory.
  • This leaves all the tool subjected to any low-level optimisation in frame objects and the call mechanism, which further impacts accuracy and maintainability.
  • Some optimisations such as true function inlining may make the current approach not sufficient as frames will be missing from the normal call stack.

For these reasons, I propose to add a new interface consisting in a contiguous array of pointers that always contains all the code objects necessary to retrieve the call stack. This has the following advantages:

  • The full stack can be efficiently copied in one system call (using for instance process_vm_readv).
  • Having code objects and not frames allows for less system calls and less exposure to implementation details.
  • Optimizations can keep easily and very cheaply this interface running while doing aggressive changes in the real frame management.
  • The continuous cost is just one pointer write per function call, which is negligible.

Linked PRs