bcgov/TheOrgBook

TOB-API (django) pod crashes with `thread panicked while panicking` error while attempting to access the wallet when it is unavailable.

WadeBarnes opened this issue · 2 comments

TEST 1:

  • TOB-API Up
  • Request a proof - PASS
  • Recycle Wallet-db. Wait for it to come up.
  • Request a proof - PASS

TEST 2:

  • TOB-API Up
  • Request a proof - PASS
  • Bring Wallet-db down.
  • Request a proof - FAIL; Pod crashes with a thread panic error.
    • Pod remains in a crash loop afterward, until the wallet database is available again; due to indy sync errors, and related health check failures.
    • On startup, if the wallet is not available the indy sync process is not retried following the first failed attempt. Subsequently the health check fails until the pod is finally killed and a new instance is created.
INFO 2019-01-18 16:15:29,323 helpers 1 139808262432512 172.51.108.1 [18/Jan/2019:16:15:28 +0000] "GET /api/v2/credential/344469/formatted HTTP/1.0" 200 7434 "https://test.orgbook.gov.bc.ca/en/organization/A0079341/cred/344469/verify " "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"
WARNING 2019-01-18 16:15:29,371 views 1 139808262432512 >>> Verify credential
INFO 2019-01-18 16:15:29,415 libindy 1 139807434639104 	src/commands/mod.rs:114 | AnoncredsCommand command received
INFO 2019-01-18 16:15:29,415 libindy 1 139807434639104 	src/commands/anoncreds/mod.rs:54 | Prover command received
INFO 2019-01-18 16:15:29,415 libindy 1 139807434639104 	src/commands/anoncreds/prover.rs:182 | GetCredential command received
INFO 2019-01-18 16:15:37,587 helpers 1 139808262432512 172.51.56.1 [18/Jan/2019:16:15:37 +0000] "GET /health HTTP/1.1" 200 201 "-" "kube-probe/1.10+"
ERROR 2019-01-18 16:15:59,372 views 1 139808262432512 Credential verification error:
Traceback (most recent call last):
  File "/home/indy/tob_anchor/views.py", line 543, in verify_credential
    proof = await proof_manager.construct_proof_async()
  File "/home/indy/api_v2/indy/proof.py", line 46, in construct_proof_async
    self.credential_ids,
  File "/home/indy/.pyenv/versions/3.6.7/lib/python3.6/site-packages/vonx/indy/client.py", line 376, in construct_proof
    messages.ConstructedProof)
  File "/home/indy/.pyenv/versions/3.6.7/lib/python3.6/site-packages/vonx/indy/client.py", line 58, in _fetch
    result = await self._target.request(request)
concurrent.futures._base.CancelledError
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Error(Some("IO error: No route to host (os error 113)"))', libcore/result.rs:1009:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.
ERROR 2019-01-18 16:15:59,397 web_protocol 1 139808262432512 Unhandled exception
Traceback (most recent call last):
  File "/home/indy/.pyenv/versions/3.6.7/lib/python3.6/site-packages/aiohttp/web_protocol.py", line 398, in start
    await resp.prepare(request)
  File "/home/indy/.pyenv/versions/3.6.7/lib/python3.6/site-packages/aiohttp/web_response.py", line 300, in prepare
    return await self._start(request)
  File "/home/indy/.pyenv/versions/3.6.7/lib/python3.6/site-packages/aiohttp/web_response.py", line 605, in _start
    return await super()._start(request)
  File "/home/indy/.pyenv/versions/3.6.7/lib/python3.6/site-packages/aiohttp/web_response.py", line 367, in _start
    await writer.write_headers(status_line, headers)
  File "/home/indy/.pyenv/versions/3.6.7/lib/python3.6/site-packages/aiohttp/http_writer.py", line 100, in write_headers
    self._write(buf)
  File "/home/indy/.pyenv/versions/3.6.7/lib/python3.6/site-packages/aiohttp/http_writer.py", line 57, in _write
    raise ConnectionResetError('Cannot write to closing transport')
ConnectionResetError: Cannot write to closing transport
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: "PoisonError { inner: .. }"', libcore/result.rs:1009:5
stack backtrace:
   0:     0x7f277410538f - std::sys::unix::backtrace::tracing::imp::unwind_backtrace::h1fd4e34c3d03ef64
                               at libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1:     0x7f277410e0e7 - std::sys_common::backtrace::print::h714a469856413294
                               at libstd/sys_common/backtrace.rs:71
                               at libstd/sys_common/backtrace.rs:59
   2:     0x7f2774108eff - std::panicking::default_hook::{{closure}}::h46fe49f863fa9721
                               at libstd/panicking.rs:211
   3:     0x7f2774108c64 - std::panicking::default_hook::h12f83bcd26b03624
                               at libstd/panicking.rs:227
   4:     0x7f27741095be - std::panicking::rust_panic_with_hook::hde420d6fd4455550
                               at libstd/panicking.rs:476
   5:     0x7f2774109161 - std::panicking::continue_panic_fmt::h8f394f3c578bcc76
                               at libstd/panicking.rs:390
   6:     0x7f2774109045 - rust_begin_unwind
                               at libstd/panicking.rs:325
   7:     0x7f277412321c - core::panicking::panic_fmt::hca5dc4e8b320bc56
                               at libcore/panicking.rs:77
   8:     0x7f2774065c7f - core::result::unwrap_failed::hec4a451fd3384f9c
   9:     0x7f2774094a09 - indystrgpostgres::PostgresWallet::close::ha2c86627d1ccaabc
  10:     0x7f2774ff5e07 - core::ptr::drop_in_place::hb150138e31986635
  11:     0x7f2774cfc18b - core::ptr::drop_in_place::hb00333ef3231ab9d
  12:     0x7f2774f44c7d - <alloc::rc::Rc<T> as core::ops::drop::Drop>::drop::h3b965101cc516f15
  13:     0x7f27751084b4 - indy::commands::CommandExecutor::new::{{closure}}::hfb1561e8b70d0124
  14:     0x7f27753fd6b9 - __rust_maybe_catch_panic
                               at libpanic_unwind/lib.rs:102
  15:     0x7f2774e070ec - <F as alloc::boxed::FnBox<A>>::call_box::hcaa4fc613b02c583
  16:     0x7f27753eb59d - std::sys_common::thread::start_thread::h44127e03e78ca137
                               at liballoc/boxed.rs:682
                               at libstd/sys_common/thread.rs:24
  17:     0x7f27753e01a5 - std::sys::unix::thread::Thread::new::thread_start::h8f17b97f2223146c
                               at libstd/sys/unix/thread.rs:90
  18:     0x7f27a58e66b9 - start_thread
  19:     0x7f27a4f0c41c - clone
  20:                0x0 - <unknown>
thread panicked while panicking. aborting.

Test 3:

  • TOB-API Up
  • Bring Wallet-db down.
  • Request a proof, and start DB immediately afterward.
  • FAIL; same error as Test2.

The application, when running on OpenShift, will eventually recover once the wallet-db comes back up. The tob-api pod will be killed and recycled until it is finally able to successfully perform it's Indy sync.

The recovery process would be quicker, if tob-api retried the Indy sync on failure during startup.

ianco commented

Won't fix.

The error is thrown within the Rust Postgres connector code. The published version of the library is over a year old. There is more recent code available in GitHub, however I was not able to build it (couldn't resolve all dependencies - many of the dependent libraries are also out of date and couldn't determine the proper mix of published libraries vs pulling code directly off github to get the Postgres library to compile).

Investigated having the tob-api code auto-restart on startup error - this is not straightforward, as the error occurs partway through the von-x initialization - stopping the initialization partyway through and re-starting lead to other errors. Didn't want to spend too much time digging into von-x as the agent is being re-written anyways.

So throwing this ticket into "won't fix" status.