Unwrapping None while attempting a clean shutdown
Opened this issue · 9 comments
After ctrl-c, I get caught on this unwrap:
Line 620 in f89a9c4
Stack trace below. Any ideas on where to track down this bug? Or is it just an unwrap that should be an if let Some(r) = ...
?
05/24/20 21:50:51 [synapse::init:128] I: Shutting down cleanly. Interrupt again to shut down immediately.
thread 'rpc' panicked at 'called `Option::unwrap()` on a `None` value', src/rpc/processor.rs:624:33
stack backtrace:
0: backtrace::backtrace::libunwind::trace
at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/libunwind.rs:86
1: backtrace::backtrace::trace_unsynchronized
at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/mod.rs:66
2: std::sys_common::backtrace::_print_fmt
at src/libstd/sys_common/backtrace.rs:78
3: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
at src/libstd/sys_common/backtrace.rs:59
4: core::fmt::write
at src/libcore/fmt/mod.rs:1069
5: std::io::Write::write_fmt
at src/libstd/io/mod.rs:1504
6: std::sys_common::backtrace::_print
at src/libstd/sys_common/backtrace.rs:62
7: std::sys_common::backtrace::print
at src/libstd/sys_common/backtrace.rs:49
8: std::panicking::default_hook::{{closure}}
at src/libstd/panicking.rs:198
9: std::panicking::default_hook
at src/libstd/panicking.rs:218
10: std::panicking::rust_panic_with_hook
at src/libstd/panicking.rs:511
11: rust_begin_unwind
at src/libstd/panicking.rs:419
12: core::panicking::panic_fmt
at src/libcore/panicking.rs:111
13: core::panicking::panic
at src/libcore/panicking.rs:54
14: core::option::Option<T>::unwrap
at /rustc/b2e36e6c2d229126b59e892c9147fbb68115d292/src/libcore/macros/mod.rs:34
15: synapse::rpc::processor::Processor::handle_ctl
at src/rpc/processor.rs:624
16: synapse::rpc::RPC::handle_ctl
at src/rpc/mod.rs:259
17: synapse::rpc::RPC::run
at src/rpc/mod.rs:239
18: synapse::rpc::RPC::start::{{closure}}
at src/rpc/mod.rs:207
19: synapse::handle::Handle<I,O>::run::{{closure}}
at src/handle.rs:39
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
05/24/20 21:50:52 [synapse::init:41] E: Unclean shutdown detected, terminating
I think it's useful to keep that unwrap there, or at least report an error if it's handled via if-let.
Tracking this down isn't so simple since it requires examining every codepath that adds a resource to the RPC handler and then thinking about how a shutdown could trigger an out of order delete before add (or just failure to add). Is this a bug you see frequently occurring?
it happens every time currently. it must be serialized in a cache file then? I'll see about printing more information during the None case
This helps with tracking things down then, maybe try printing the ID and then in all the rpc_id generating functions match over that ID and see where it's coming from.
Ok I have more information now that I'm printing the non-present ids:
05/25/20 21:51:36 [synapse::init:128] I: Shutting down cleanly. Interrupt again to shut down immediately.
DBG: id not found: "D3EB14FAF8EADF95AB1E00C588F5A9E47A384CEC"
DBG: self.resources.len() 416
DBG: id not found: "7041EEC6BA6F687BDE3F2641D50097F1E2B7F0AE"
DBG: self.resources.len() 416
DBG: id not found: "2966614EDB1EF630E251664D52BDBA7EB6A522DD"
DBG: self.resources.len() 416
DBG: id not found: "26D3E02B33079FE14825D6BC5773B210FB589876"
DBG: self.resources.len() 216
DBG: id not found: "2DE4731620D5E44F55DA77849A069EEC21AEC575"
DBG: self.resources.len() 216
DBG: id not found: "3B2A920A1EF7664A08AEB8299AFE449832DCC527"
DBG: self.resources.len() 216
DBG: id not found: "5B15FFC79AED3C26B650747C6BFC6B1E99523AE0"
DBG: self.resources.len() 31
DBG: id not found: "DA8AFC017D7A1BDB0F128259991DCC78FB3073CF"
DBG: self.resources.len() 31
DBG: id not found: "07098728BD8EB60DAD3773D0EA2BBF4C6D5D9A5A"
DBG: self.resources.len() 31
DBG: id not found: "A006F7F78A5CAA5EBB13561454DF7C29692F14FD"
DBG: self.resources.len() 22
DBG: id not found: "06C3E555F55C295948E788DC6F55274EBCFA0045"
DBG: self.resources.len() 22
DBG: id not found: "D71C9FBB0BDEBAF020371C9BD48616E6F8E7C60F"
DBG: self.resources.len() 22
05/25/20 21:51:39 [synapse::init:45] I: Shutdown complete
Looks like there's always 3 failures in a row. The first failure in each set of 3 is a valid torrent that I have seeding. I don't know what the other two are.
Also, I get a different set of IDs each run. This seems like a race condition of some sort, where perhaps another thread has dumped the same resource
here's the call that is sending the Removed
request:
Line 1636 in edbd189
Looking into this it seems like either a file or tracker wasn't properly added. Can you add some tracing to the file_rpc_id/tracker_rpc_id functions and see if they're called with those hashes. My guess is that it's files since there's a conditional clause around magnets for it, but looking through that logic I don't see what's wrong. Can you try to identify the torrent that causes this and give some basic information about its state?
I added a trace to send_rpc_removal
to see what it was requesting to remove, and it is sending a removal request twice. See this trace, this set of 3 was removed successfully, then about 20 requests later sent again.
DBG: sending rpc msg: removing ["1B78D8A59F99F137E76CD57DFCCB3263D366FC1C", "78A81F7C24D41FFEF8553B57BA071C98523AE4A2", "1CD1D6C1B28D6F4A7E1BBDF6DB28074353EA89C0"]
<snip>
DBG: sending rpc msg: removing ["1B78D8A59F99F137E76CD57DFCCB3263D366FC1C", "78A81F7C24D41FFEF8553B57BA071C98523AE4A2", "1CD1D6C1B28D6F4A7E1BBDF6DB28074353EA89C0"]
DBG: id not found: "1B78D8A59F99F137E76CD57DFCCB3263D366FC1C"
DBG: self.resources.len() 138
DBG: id not found: "78A81F7C24D41FFEF8553B57BA071C98523AE4A2"
DBG: self.resources.len() 138
DBG: id not found: "1CD1D6C1B28D6F4A7E1BBDF6DB28074353EA89C0"
DBG: self.resources.len() 138
I wonder how it's possible that this is being dropped twice since that's the only time this code is called. I'll have to do some more thinking, but I recommend you add some more tracing around the object to get more context around how this is happening.
So I was looking at this again yesterday, and couldn't reproduce the issue. Now I'm really not sure how to proceed!
It could have been induced by a tracker not giving expected input, or a peer, or anything else.
At any rate, I'm not having unclean shutdown issues anymore.