osandov/drgn

Add thread attach/detach/event operations with ptrace

Opened this issue · 0 comments

Before we can support live userspace debugging features like stack traces of live processes, breakpoints, or single-stepping, drgn needs basic support for ptrace(2). @Svetlitski-FB prototyped this in #142 with a very simple pause()/resume() interface, but as I commented in that PR, I think we want a more flexible event-based interface.

Namely, at minimum, we want interfaces to:

  • Explicitly attach to and detach from threads.
  • Get ptrace events for attached threads (for signals, exit, exec, clone/fork/vfork).
  • Send a PTRACE_INTERRUPT to an attached thread.

This is a rough sketch of the envisioned interface (mostly copied from the PR comment mentioned above):

class Program:
    def attach_all_threads(self) -> None:
        """
        Attach to all threads in this program as well as any new threads that
        are created.
        """
        ...

    def detach_all_threads(self) -> None:
        """Detach from all threads in this program."""
        ...

    def get_thread_event(self, block: bool = True) -> ThreadEvent:
        """
        Get the next event for any thread in this program. If *block* is
        ``True``, wait for the next event. If *block* is ``False``, raises an
        exception if no event is available (TODO: which one? Non-blocking
        sockets seem to return an OSError with errno set appropriately).
        """
        ...


class Thread:
    def attach(self) -> bool:
        """
        Attach to this thread.

        :return: ``True`` on success, ``False`` if the thread was already attached
        """
        ...

    def detach(self) -> bool:
        """
        Detach from this thread.

        :return: ``True`` on success, ``False`` if the thread was not attached
        """
        ...

    def interrupt(self) -> None:
        """
        Stop this thread. It is not actually stopped until you get a
        ThreadEventStop from Program.get_thread_event() or Thread.get_event().
        """
        ...

    def continue(self) -> bool:
        """
        If this thread is currently stopped, resume it.

        :return: ``True`` on success, ``False`` if the thread was not stopped
        """
        ...

    def get_event(self, block: bool = True) -> ThreadEvent:
        """
        Get the next event for this thread.
        """
        ...

Then, we can have a dumb ThreadEvent object for different types of events. Maybe something like:

ThreadEvent = Union[ThreadEventSignal, ThreadEventExit, etc...]

class ThreadEventSignal:
    signal: int

class ThreadEventExit:
    status: int

etc...

The above (along with the equivalent libdrgn interfaces) is more or less what I'd consider the MVP for this design. There are obviously lots of finer details to consider, as well as complications with ptrace that I may not have considered.

We'll also want to ensure that this interface is generic enough that it could be used with alternate "backends" (specifically, the gdbstub protocol.

There are some "extras" that we should implement eventually, possibly in follow-up PRs:

  • We might still want a shortcut Thread.pause(), which I think would basically be:
def pause(self):
    self.interrupt()
    while True:
        event = self.get_event()
        if isinstance(event, ThreadEventStop):
            return
        # TODO: should we have a distinction between "stopping" events that
        # require a continue and "non-stopping" events?
        self.continue()  
  • ptrace allows suppressing signals that the program received. Should we have a suppress_signal: bool = False parameter to continue()?
  • How would we expose PTRACE_SYSCALL and PTRACE_SINGLESTEP? More parameters for continue()?