plenum/test/node_catchup/test_node_catchup_with_connection_problem hanging on Ubuntu 20.04
WadeBarnes opened this issue · 13 comments
When running on Ubuntu 20.04 the follow tests hang and never complete:
plenum/test/node_catchup/test_node_catchup_with_connection_problem.py::test_catchup_with_lost_ledger_status
- On the fourth iteration when all four iterations are run the test hangs.
- When the fourth iteration (
lost_count=4
) is run on it's own the test passes. - Details of the investigation below.
plenum/test/node_catchup/test_node_catchup_with_connection_problem.py::test_catchup_with_lost_first_consistency_proofs
- On the first iteration.
- Cause has not been investigated.
plenum/test/node_catchup/test_node_catchup_with_connection_problem.py::test_cancel_request_cp_and_ls_after_catchup
- On the first iteration.
- Cause has not been investigated.
Investiagtion into hang issue with plenum/test/node_catchup/test_node_catchup_with_connection_problem.py::test_catchup_with_lost_ledger_status
The tests are hanging on this line:
- https://github.com/WadeBarnes/indy-plenum/blob/20.04-test-debugging/storage/kv_store_rocksdb.py#L32
The code does not seem you go into theopen
call, although it could just be a timing issue with the logs.
Thinking that it could be an issue with RocksDB or the Python wrapper I tried building the wrapper straight from source git+https://github.com/twmht/python-rocksdb.git#egg=python-rocksdb
to get the new close
method that is not included in the released PyPi version. The latest code causes a seg fault
on close
, so I also tried git+https://github.com/alexreg/python-rocksdb.git@fix_close_segfault#egg=python-rocksdb
which fixes the seg fault issue. My thought was the rocksDB instances were not getting closed/deposed properly. None of this made any difference, the tests still hung.
If you modify the code to only run 3 iterations, rather than the 4, you avoid the hang and the tests pass. If you modify the code to run just the 4th iteration the tests pass.
Steps to reproduce:
Using https://github.com/WadeBarnes/indy-plenum/blob/20.04-test-debugging
MINGW64 /c/indy-plenum (20.04-test-debugging)
$ docker build -t plenum-build:2004 -f .github/workflows/build/Dockerfile.ubuntu-2004 .
MINGW64 /c/indy-plenum (20.04-test-debugging)
$ docker build -t indy-plenum-test:2004 -f .github/workflows/build/Dockerfile.test-2004 .
MINGW64 /c/indy-plenum (20.04-test-debugging)
$ docker run --rm -it --name plenum-testing --volume='//c/indy-plenum:/home/indy/indy-plenum:Z' indy-plenum-test:2004 bash
root@cd83db811641:/home/indy/indy-plenum# python3 -m pytest -l -v --log-cli-level=WARNING --disable-warnings plenum/test/node_catchup/test_node_catchup_with_connection_problem.py::test_catchup_with_lost_ledger_status
Result:
On the fourth iteration of plenum/test/node_catchup/test_node_catchup_with_connection_problem.py::test_catchup_with_lost_ledger_status
it will hang on root:kv_store_rocksdb.py:30 Init KeyValueStorageRocksdb -> open
WARNING root:compact_merkle_tree.py:57 <- _update
PASSED [ 75%]
plenum/test/node_catchup/test_node_catchup_with_connection_problem.py::test_catchup_with_lost_ledger_status[4]
---------------------------------------------------------------------------------------------------- live log call -----------------------------------------------------------------------------------------------------WARNING root:test_node_catchup_with_connection_problem.py:44 lost_count: 4
WARNING root:test_node_catchup_with_connection_problem.py:45 txnPoolNodeSet: [Alpha, Beta, Gamma, Delta]
WARNING root:test_node_catchup_with_connection_problem.py:46 looper: <stp_core.loop.looper.Looper object at 0x7f6629910220>
WARNING root:test_node_catchup_with_connection_problem.py:47 sdk_pool_handle: 2
WARNING root:test_node_catchup_with_connection_problem.py:48 sdk_wallet_steward: (5, 'MSjKTWkPLtYoPEaTF1TUDb')
WARNING root:test_node_catchup_with_connection_problem.py:49 tconf: <module 'indy_config.py' from '/tmp/pytest-of-root/pytest-1/tmp0/etc/indy/indy_config.py'>
WARNING root:test_node_catchup_with_connection_problem.py:50 tdir: /tmp/pytest-of-root/pytest-1/tmp0
WARNING root:test_node_catchup_with_connection_problem.py:51 allPluginsPath: ['/home/indy/indy-plenum/plenum/test/plugin/stats_consumer']
WARNING root:test_node_catchup_with_connection_problem.py:52 monkeypatch: <_pytest.monkeypatch.MonkeyPatch object at 0x7f662817fca0>
...
WARNING root:kv_store_rocksdb_int_keys.py:23 -> Init KeyValueStorageRocksdbIntKeys
WARNING root:kv_store_rocksdb.py:20 -> Init KeyValueStorageRocksdb
WARNING root:kv_store_rocksdb.py:30 Init KeyValueStorageRocksdb -> open
About this issue. We have to copy deb package from xenial
repo to bionic
and install rocksdb=5.8.8
instead of librocksdb5.17
.
Or, from the other hand, of cause we can start moving process from 5.8 version to 5.17 for rocksdb. But it can take a lot of effort.
I setup a VSCode remote container environment for indy-plenum
(https://github.com/WadeBarnes/indy-plenum/tree/ubuntu-20.04-dev-container) to debug this issue further. So far it appears the issue is not with RocksDB at all. When setting breakpoints and stepping though the code it gets well past the point indicated above and ends up hanging here; https://github.com/WadeBarnes/indy-plenum/blob/ubuntu-20.04-dev-container/ledger/ledger.py#L65. I'm hoping to be able to dig into this more today.
@anikitinDSR, are you saying you've tested it with RocksDB 5.8.8 and you don't experience the hanging issue with the tests?
Exactly. You can remove 5.17 version of rocksdb
inside the container and install rocksdb
5.8 from repo.sovrin.org
xenial
instead of bionic
. Also, please revert self._db.close()
calling. I mean this one:
https://github.com/WadeBarnes/indy-plenum/blob/45b163d056c3d6b9411771a693bd3bfbb45f3569/storage/kv_store_rocksdb.py#L157
@anikitinDSR, Which of the RocksDB 5.8.8 packages did you use? I'm getting errors trying to install the one from deb https://repo.sovrin.org/lib/apt xenial stable
I think this one can be useful:
deb https://repo.sovrin.org/deb xenial master
you can try this:
WadeBarnes#3
But please make sure, that it's only for showing that it works with rocksdb5.8 and it cannot be a fix.
From my point of view, we need just copy rocksdb5.8 .deb package from xenial to bionic repo and setup it as in PR.
bionic
isn't really the right place for it either, since we're targeting focal
.
It's ok that you want to use another repo. The main goal here is that you have to use rocksdb version 5.8, because our source code expect exactly API from this version.
For using another version of rocksdb changes in the source code are needed.
I updated the https://github.com/WadeBarnes/indy-plenum/tree/20.04-test-debugging code following your recommendations to create a PoC that ran the test via GHA, https://github.com/WadeBarnes/indy-plenum/actions/runs/1031135484, to prove all the tests pass. The one test that is failing in that run is an unrelated issue.
rocksdb_5.8.8_amd64.deb
has been published into the Hyperleger Indy repository and registered as supporting
focal
, bionic
, and xenial
; rocksdb_5.8.8_amd64.deb
@udosson, @anikitinDSR, I've updated the test branch with the new repository information; https://github.com/WadeBarnes/indy-plenum/blob/20.04-test-debugging/.github/workflows/build/Dockerfile.ubuntu-2004#L11-L15
Successful test run here; https://github.com/WadeBarnes/indy-plenum/actions/runs/1049822050
I've confirmed RocksDB gets picked up from the Hyperledger repository. @udosson, You con go ahead with integrating these changes into your PR, and then we can close this ticket.
This fix has been integrated into PR #1545