janestreet/magic-trace

Assumptions violated for Rust binary

agourlay opened this issue · 3 comments

Thank you very much for this tool, it looks very promising!

I got the following error while profiling a multi-threaded Rust program compiled in release mode with debug symbols.

magic-trace attach -multi-thread -pid $(pidof my-program)
[ Attached. Press Ctrl-C to stop recording. ]
^C[ Got signal, detaching... ]
[ Snapshot taken. ]
[ perf record: Woken up 79 times to write data ]
[ perf record: Captured and wrote 52,670 MB /tmp/magic_trace.tmp.1db10b/perf.data ]
[ Finished recording. ]
[ Decoding, this takes a while... ]
("BUG: assumptions violated, saw an unexpected event for this trace mode"
 (trace_mode Userspace)
 (event
  (Ok
   ((thread ((pid ()) (tid ()))) (time 46m10.009811597s) (kind Syscall)
    (src
     ((instruction_pointer 0x7f40e37ac2d9) (symbol Unknown)
      (symbol_offset 0x0)))
    (dst
     ((instruction_pointer 0x7f40e37a61a6) (symbol Unknown)
      (symbol_offset 0x0)))))))
Warning:
512 instruction trace errors

The trace does show some profiling information when loaded in the UI but I can't tell if it is complete/valid.

I have not seen a clear indication in the documentation regarding support for Rust programs so it might be the problem :)

Environment:

  • cpu i7-10610U
  • perf version 5.15.30
Xyene commented

Thanks for reporting this!

Rust should be supported, but something went wrong here.

Intel PT told us that we had a control flow transfer from 0x7f40e37ac2d9 to 0x7f40e37a61a6, of type "syscall". But this isn't possible, because both of those addresses are in userspace (don't have the upper bits set). If this was really a syscall, we'd expect the destination to be a kernel address. Or, if running without -trace-include-kernel like you did, we'd expect to see a trace end marker.

512 instruction trace errors is a little concerning, it suggests Intel PT ran out of memory bandwidth to write down branches that many times. You can try running with -timer-resolution low, which will decrease the accuracy of the times, but similarly decrease the amount of memory bandwidth required.

If you're up for more debugging, it would be useful to:

  • export MAGIC_TRACE_DEBUG=1
  • pass -working-dir /tmp and -z-debug-print-perf-events
  • copy the perf script command printed when you run your original command
  • run the perf script command manually, > /tmp/script-output
  • open up /tmp/script-output, and find the region and surrounding context corresponding with the violated assumption, and comment with it here (feel free to anonymize symbol names)

It seems that I am not able to reproduce it anymore, this is a bit frustrating.
I will continue playing with magic-trace in the meantime!

Ok, closing this for now. Will re-open if you (or anyone else) manages to reproduce this.