janestreet/magic-trace

magic-trace run: exception raised while parsing perf output: Scanf.Scan_failure

opsound opened this issue ยท 11 comments

What happened?

I ran

./magic-trace run -working-directory /tmp/magic sleep -- 10
Warning: magic-trace will only be able to snapshot when magic-trace is Ctrl+C'd, not when the application it's running ends. If that application ends before magic-trace can snapshot it, the resulting trace will be empty. The ability to snapshot when an application teminates was added to perf's userspace tools in version 5.4. For more information, see:
https://github.com/janestreet/magic-trace/wiki/Supported-platforms,-programming-languages,-and-runtimes#supported-perf-versions
Using perf wrapper that supports hot-text. Try perf.real if you encounter any issues.
^C[ Snapshot taken. ]
[ Finished recording. ]
[ Decoding, this takes a while... ]
Using perf wrapper that supports hot-text. Try perf.real if you encounter any issues.
(monitor.ml.Error
 ("BUG: exception raised while parsing perf output. Please report this to https://github.com/janestreet/magic-trace/issues/"
  (exn
   (Scanf.Scan_failure "scanf: bad input at char number 16: int_of_string"))
  (perf_output
   "957410/957410 1154506.531823215:   jmp                           708540b [unknown] =>          7084ef0 tan@plt+0xfffffffffef9b000"))
 ("Raised at Base__Error.raise in file \"src/error.ml\" (inlined), line 9, characters 14-30"
  "Called from Base__Error.raise_s in file \"src/error.ml\", line 10, characters 19-40"
  "Called from Async_kernel__Pipe.map.(fun) in file \"src/pipe.ml\", line 1042, characters 65-70"
  "Called from Async_kernel__Job_queue.run_jobs in file \"src/job_queue.ml\", line 167, characters 6-47"
  "Caught by monitor Monitor.protect"))

What did you expect to happen?

program gets magically traced

How can we, magic-trace developers, reproduce the issue you're seeing?

Maybe try repro-ing w/ my version of perf

What version of perf userspace tools are you running? perf --version

]# perf --version
Using perf wrapper that supports hot-text. Try perf.real if you encounter any issues.
perf version 5.2.8.gd36a8d2

What CPU do you have? cat /proc/cpuinfo

Intel(R) Xeon(R) Platinum 8321HC CPU @ 1.40GHz

Thanks for the report! I'm tagging in @Xyene to take a look at this one.

Xyene commented

Hmm... from the wrapper's warning message, I'm assuming this is a perf patched for hugetext/BOLT, which is not open-source for us to try reproducing with?

I think it's doing something wonky with resolving symbols that's wrong, but it's impossible to say what without staring at the source. The output from perf you provided,

957410/957410 1154506.531823215:   jmp                           708540b [unknown] =>          7084ef0 tan@plt+0xfffffffffef9b000

basically says "we jumped from an unknown symbol at 708540b to 7084ef0, the destination of which is tan@plt offset by 0xfffffffffef9b000". But this offset is so large, it lies in kernel space! That's not possible; I'm assuming something went wrong in the symbol resolution in the custom perf.

Closing for now. @opsound feel free to re-open this if/when you figure out what's going on with your custom perf. I fully admit that it might be trying to communicate something useful here; we just have no idea what.

Thanks for taking a look @Xyene & @cgaebel. Let me look into our perf.

That said, the scanf error that magic-trace is raising seems low level compared to logical inconsistencies @Xyene was describing. I'm not very familiar with perf. Is the issue that this line has a fundamentally different schema than what magic-trace is expecting?

The scanf is a bit of a red herring. It's trivial to fix, but when we fix it magic-trace will just get confused at a later point because of the problem that @Xyene pointed out.

Xyene commented

OCaml integers are 63 bits by default (one tag bit for the garbage collector), so trying to parse the offset in tan@plt+0xfffffffffef9b000 fails.

It's possible to get around this locally by changing the parsing code to use Int64.ts, but there's a cost to this (they're boxed values), so I'd like to understand why the offset would have the upper bit set before making changes to magic-trace.

It's possible that this might be a bug in 5.2 perf rather than your wrapper, in which case we'll have to carry a hack in magic-trace, and that's fine.

I got the same exception today using perf version 5.15.30.
(Scanf.Scan_failure "scanf: bad input at char number 16: int_of_string")) (perf_output "1261409/1261427 956710.813468091: jmp 7f681080a34b folly::IOBuf::appendToChain+0xb => 7f6810804000 folly::AtomicNotificationQueue<folly::MeteredExecutor::Task>::push<folly::MeteredExecutor::Task>@plt+0xffff8097ef7fc000"))
Only thing i can think of that i'm doing that might be a bit strange is using the mold linker.

@EspenG I've a short battery of questions for you. Sorry about this, but we don't really know what's going on here yet.

  1. Is that from an open source project? Do you have a way for us to reproduce it?

  2. Do you work at Meta? Is your binary compiled with hot-text support, as alluded to by @opsound's "Using perf wrapper that supports hot-text. Try perf.real if you encounter any issues." message? If it is, does magic-trace work when you turn that off?

  3. Do you get the same errors if you switch from mold to another linker?

  1. no sorry.
  2. no, just using the folly library. not compiled with hot text support as far as i know.
  3. tried with out using mold and that seems to be working fine. So a work around for me is just to not use mold.

Thanks. I'll see if I can find some time to try out mold this weekend to reproduce this.

Xyene commented

I tried compiling a slightly modified demo.c with: { perf 5.2, perf 5.15, perf 5.17 } ร— { mold 1.2.1, ld 2.38 } ร— { gcc 12 } ร— { -flto, -fno-lto } ร— { -fuse-ld=mold, } ร— { -lm, }, but was unable to reproduce.

@EspenG, what version of GCC/mold were you using, and are you able to share any extra compilation flags that might have been passed? Are you able to reproduce the issue with demo.c if you compile it with mold?