osandov/drgn

Backtrace from wrong CPU in kernel core dump

Closed this issue · 1 comments

For kernel core dumps, drgn initiates a backtrace for a task that was running at the time of the crash by getting the registers from an NT_PRSTATUS note in the core dump. These notes are supposed to be indexed by CPU number. However, if the registers could not be saved for a given CPU, its note is omitted, which messes up the numbering for every CPU after it. This can happen in at least a couple of cases:

  1. If the CPU is offline (see #391).
  2. If the CPU was locked up and didn't respond to the crash NMI.

The first case could be detected by looking at the online CPU mask, but the second case can't easily be corrected. This means we can't rely on NT_PRSTATUS. Instead, we probably have to look at the crash_notes per-CPU variable, which is what ends up in NT_PRSTATUS anyways.

There's a complication here: for core dumps not from an actual kernel crash (like QEMU's dump-guest-memory), we still need to use NT_PRSTATUS.