TOB-API (django) pod crashes with `thread panicked while panicking` error while attempting to access the wallet when it is unavailable.
WadeBarnes opened this issue · 2 comments
TEST 1:
- TOB-API Up
- Request a proof - PASS
- Recycle Wallet-db. Wait for it to come up.
- Request a proof - PASS
TEST 2:
- TOB-API Up
- Request a proof - PASS
- Bring Wallet-db down.
- Request a proof - FAIL; Pod crashes with a thread panic error.
- Pod remains in a crash loop afterward, until the wallet database is available again; due to indy sync errors, and related health check failures.
- On startup, if the wallet is not available the indy sync process is not retried following the first failed attempt. Subsequently the health check fails until the pod is finally killed and a new instance is created.
INFO 2019-01-18 16:15:29,323 helpers 1 139808262432512 172.51.108.1 [18/Jan/2019:16:15:28 +0000] "GET /api/v2/credential/344469/formatted HTTP/1.0" 200 7434 "https://test.orgbook.gov.bc.ca/en/organization/A0079341/cred/344469/verify " "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"
WARNING 2019-01-18 16:15:29,371 views 1 139808262432512 >>> Verify credential
INFO 2019-01-18 16:15:29,415 libindy 1 139807434639104 src/commands/mod.rs:114 | AnoncredsCommand command received
INFO 2019-01-18 16:15:29,415 libindy 1 139807434639104 src/commands/anoncreds/mod.rs:54 | Prover command received
INFO 2019-01-18 16:15:29,415 libindy 1 139807434639104 src/commands/anoncreds/prover.rs:182 | GetCredential command received
INFO 2019-01-18 16:15:37,587 helpers 1 139808262432512 172.51.56.1 [18/Jan/2019:16:15:37 +0000] "GET /health HTTP/1.1" 200 201 "-" "kube-probe/1.10+"
ERROR 2019-01-18 16:15:59,372 views 1 139808262432512 Credential verification error:
Traceback (most recent call last):
File "/home/indy/tob_anchor/views.py", line 543, in verify_credential
proof = await proof_manager.construct_proof_async()
File "/home/indy/api_v2/indy/proof.py", line 46, in construct_proof_async
self.credential_ids,
File "/home/indy/.pyenv/versions/3.6.7/lib/python3.6/site-packages/vonx/indy/client.py", line 376, in construct_proof
messages.ConstructedProof)
File "/home/indy/.pyenv/versions/3.6.7/lib/python3.6/site-packages/vonx/indy/client.py", line 58, in _fetch
result = await self._target.request(request)
concurrent.futures._base.CancelledError
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Error(Some("IO error: No route to host (os error 113)"))', libcore/result.rs:1009:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.
ERROR 2019-01-18 16:15:59,397 web_protocol 1 139808262432512 Unhandled exception
Traceback (most recent call last):
File "/home/indy/.pyenv/versions/3.6.7/lib/python3.6/site-packages/aiohttp/web_protocol.py", line 398, in start
await resp.prepare(request)
File "/home/indy/.pyenv/versions/3.6.7/lib/python3.6/site-packages/aiohttp/web_response.py", line 300, in prepare
return await self._start(request)
File "/home/indy/.pyenv/versions/3.6.7/lib/python3.6/site-packages/aiohttp/web_response.py", line 605, in _start
return await super()._start(request)
File "/home/indy/.pyenv/versions/3.6.7/lib/python3.6/site-packages/aiohttp/web_response.py", line 367, in _start
await writer.write_headers(status_line, headers)
File "/home/indy/.pyenv/versions/3.6.7/lib/python3.6/site-packages/aiohttp/http_writer.py", line 100, in write_headers
self._write(buf)
File "/home/indy/.pyenv/versions/3.6.7/lib/python3.6/site-packages/aiohttp/http_writer.py", line 57, in _write
raise ConnectionResetError('Cannot write to closing transport')
ConnectionResetError: Cannot write to closing transport
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: "PoisonError { inner: .. }"', libcore/result.rs:1009:5
stack backtrace:
0: 0x7f277410538f - std::sys::unix::backtrace::tracing::imp::unwind_backtrace::h1fd4e34c3d03ef64
at libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
1: 0x7f277410e0e7 - std::sys_common::backtrace::print::h714a469856413294
at libstd/sys_common/backtrace.rs:71
at libstd/sys_common/backtrace.rs:59
2: 0x7f2774108eff - std::panicking::default_hook::{{closure}}::h46fe49f863fa9721
at libstd/panicking.rs:211
3: 0x7f2774108c64 - std::panicking::default_hook::h12f83bcd26b03624
at libstd/panicking.rs:227
4: 0x7f27741095be - std::panicking::rust_panic_with_hook::hde420d6fd4455550
at libstd/panicking.rs:476
5: 0x7f2774109161 - std::panicking::continue_panic_fmt::h8f394f3c578bcc76
at libstd/panicking.rs:390
6: 0x7f2774109045 - rust_begin_unwind
at libstd/panicking.rs:325
7: 0x7f277412321c - core::panicking::panic_fmt::hca5dc4e8b320bc56
at libcore/panicking.rs:77
8: 0x7f2774065c7f - core::result::unwrap_failed::hec4a451fd3384f9c
9: 0x7f2774094a09 - indystrgpostgres::PostgresWallet::close::ha2c86627d1ccaabc
10: 0x7f2774ff5e07 - core::ptr::drop_in_place::hb150138e31986635
11: 0x7f2774cfc18b - core::ptr::drop_in_place::hb00333ef3231ab9d
12: 0x7f2774f44c7d - <alloc::rc::Rc<T> as core::ops::drop::Drop>::drop::h3b965101cc516f15
13: 0x7f27751084b4 - indy::commands::CommandExecutor::new::{{closure}}::hfb1561e8b70d0124
14: 0x7f27753fd6b9 - __rust_maybe_catch_panic
at libpanic_unwind/lib.rs:102
15: 0x7f2774e070ec - <F as alloc::boxed::FnBox<A>>::call_box::hcaa4fc613b02c583
16: 0x7f27753eb59d - std::sys_common::thread::start_thread::h44127e03e78ca137
at liballoc/boxed.rs:682
at libstd/sys_common/thread.rs:24
17: 0x7f27753e01a5 - std::sys::unix::thread::Thread::new::thread_start::h8f17b97f2223146c
at libstd/sys/unix/thread.rs:90
18: 0x7f27a58e66b9 - start_thread
19: 0x7f27a4f0c41c - clone
20: 0x0 - <unknown>
thread panicked while panicking. aborting.
Test 3:
- TOB-API Up
- Bring Wallet-db down.
- Request a proof, and start DB immediately afterward.
- FAIL; same error as Test2.
The application, when running on OpenShift, will eventually recover once the wallet-db
comes back up. The tob-api
pod will be killed and recycled until it is finally able to successfully perform it's Indy sync.
The recovery process would be quicker, if tob-api
retried the Indy sync on failure during startup.
Won't fix.
The error is thrown within the Rust Postgres connector code. The published version of the library is over a year old. There is more recent code available in GitHub, however I was not able to build it (couldn't resolve all dependencies - many of the dependent libraries are also out of date and couldn't determine the proper mix of published libraries vs pulling code directly off github to get the Postgres library to compile).
Investigated having the tob-api code auto-restart on startup error - this is not straightforward, as the error occurs partway through the von-x initialization - stopping the initialization partyway through and re-starting lead to other errors. Didn't want to spend too much time digging into von-x as the agent is being re-written anyways.
So throwing this ticket into "won't fix" status.