profiler crash on 8-node S3D profile
Closed this issue · 4 comments
rohany commented
I'm seeing the following profiler crash on an 8-node run of S3D. For some reason, valid profiles are generated at all other node counts (1->16).
(nersc-python) rohany@perlmutter:login23:/pscratch/sd/r/rohany/s3d_perlmutter> RUST_BACKTRACE=full legion_prof ./sweeptest/128x64x64/auto-trace/pwave_x_8_hept/run/prof_0.gz -o test
Reading log file "./sweeptest/128x64x64/auto-trace/pwave_x_8_hept/run/prof_0.gz"...
thread 'main' panicked at /pscratch/sd/r/rohany/legion_s3d/legion/tools/legion_prof_rs/src/spy/serialize.rs:587:37:
called `Result::unwrap()` on an `Err` value: Error(Error { input: "", code: CrLf })
stack backtrace:
0: 0x55d2adfd8496 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h9cca0343d66d16a8
1: 0x55d2ae000370 - core::fmt::write::h4311bce0ee536615
2: 0x55d2adfd5c6f - std::io::Write::write_fmt::h0685c51539d0a0cd
3: 0x55d2adfd8274 - std::sys_common::backtrace::print::h2fb8f70628a241ed
4: 0x55d2adfd9af7 - std::panicking::default_hook::{{closure}}::h05093fe2e3ef454d
5: 0x55d2adfd9859 - std::panicking::default_hook::h5ac38aa38e0086d2
6: 0x55d2adfd9f88 - std::panicking::rust_panic_with_hook::hed79743dc8b4b969
7: 0x55d2adfd9e62 - std::panicking::begin_panic_handler::{{closure}}::ha437b5d58f431abf
8: 0x55d2adfd8996 - std::sys_common::backtrace::__rust_end_short_backtrace::hd98e82d5b39ec859
9: 0x55d2adfd9bb4 - rust_begin_unwind
10: 0x55d2add87765 - core::panicking::panic_fmt::hc69c4d258fe11477
11: 0x55d2add87c53 - core::result::unwrap_failed::hff299ec748d62aab
12: 0x55d2addbb2ef - legion_prof::spy::serialize::deserialize::h56b3bab62280837c
13: 0x55d2addcfe5b - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &mut F>::call_once::hcd60ee582f9cd211
14: 0x55d2addb2597 - <alloc::vec::Vec<T,A> as alloc::vec::spec_extend::SpecExtend<T,I>>::spec_extend::hadd5946032273160
15: 0x55d2addccbad - rayon::iter::plumbing::Producer::fold_with::h2df3b665c549b02f
16: 0x55d2add9e448 - rayon::iter::plumbing::bridge_producer_consumer::helper::h7229160a9354d8f6
17: 0x55d2addb197e - rayon::iter::extend::<impl rayon::iter::ParallelExtend<T> for alloc::vec::Vec<T>>::par_extend::h6237cfef6dae8bc0
18: 0x55d2addb1850 - rayon::iter::from_par_iter::<impl rayon::iter::FromParallelIterator<T> for alloc::vec::Vec<T>>::from_par_iter::h0f8c38eb4d0bd1b6
19: 0x55d2addcd7f0 - rayon::result::<impl rayon::iter::FromParallelIterator<core::result::Result<T,E>> for core::result::Result<C,E>>::from_par_iter::h1454c8c6073271ed
20: 0x55d2addd3d16 - legion_prof::main::h409a5eb431bab0b8
21: 0x55d2addc59b3 - std::sys_common::backtrace::__rust_begin_short_backtrace::h7836657c01cdd8be
22: 0x55d2addc59cd - std::rt::lang_start::{{closure}}::h8e84562a5ce98348
23: 0x55d2adfcec81 - std::rt::lang_start_internal::hdaf8b62dc8f7de54
24: 0x55d2adddddd5 - main
25: 0x7f2063cc724d - __libc_start_main
26: 0x55d2add87f2a - _start
27: 0x0 - <unknown>
The profile in question is at rohany@sapling2.stanford.edu:~/broken-s3d-profile.gz
. I am on legion branch automatic-tracing-non-idempotent-traces
, which is branched off of main from a few days ago.
elliottslaughter commented
$ ls -l ~rohany/broken-s3d-profile.gz
-rw-rw---- 1 rohany rohany 0 Apr 3 20:02 /home/rohany/broken-s3d-profile.gz
No permissions.
rohany commented
try now
elliottslaughter commented
The file seems to be empty?
$ wc -c ~rohany/broken-s3d-profile.gz
0 /home/rohany/broken-s3d-profile.gz
rohany commented
sorry for the false alarm, i can't read my logs correctly ...