range end index 8 out of range for slice of length 0
extSunset opened this issue · 12 comments
OS: Ubuntu 20.04
The screenshot is not mine, I unfortunately cannot provide more detailed logs at the moment. I will try to get them from someone and provide as soon as possible.
Issue similar to #73
We have seen quite a few database corruptions, hard to say that they are all hardware-related and with out fleet we only expect to have more of them.
If there is any instrumentation we can add to the code that'll help with debugging, we can do that too.
Looks like this happens on start? A copy of the database would be nice to have
I suspect something happened during last start and caused an app to crash, after which database got corrupted. I saw some recent fixes here so we are now shipping 0.4.4. Currently going through one of many testnet phases, should have more reports of this kind.
We have quite a few bug reports with database corruptions, hard to differentiate whether fundamental root cause is the same or not. Many of them start printing "Chain lookup failed: Failed to get header for hash 0x...".
Here is a collection of logs and databases from 4 users, I hope they are somehow useful: https://nextcloud.mokrynskyi.com/s/FtGTFJDTjm2Ds5G
P.S. #189 looks concerning and might contribute to the issues we see as well.
I'd recommend updating to 0.4.6. And issue that could have caused all of this was fixed in 0.4.5. Also some additional diagnostics have been added. Please reopen or file a new issue if it happens again with the new version.
Unfortunately we still see database corruptions on new installations with 0.4.6. The latest one looked like this with Substrate node:
2023-04-01T21:17:15.271814Z WARN sync: 💔 Verification failed for block 0x80f2373a9111fedba10cc68d241e0c35461f9a32213e9f5c0eb1b4c1716b735c received from peer: 12D3KooWPc1TfWtyBr2S4qdbTxRuh33GUey7hrFDYL3gzDzjKsGw, "Runtime code error:
:code
hash not found"
Hope to get database and more logs soon.
Added database and logs to User 5
under https://nextcloud.mokrynskyi.com/s/FtGTFJDTjm2Ds5G
Unlike previous cases, that last database is logically consistent. It's just that the block state root is missing, as if it was never added or removed at some point.
In the logs I found 2 cases where this is happened. The node managed to recover from the first one since it was on a fork. In both cases it looked like this:
{"v":0,"name":"subspace-cli","msg":"✨ Imported #49562 (0x2b9e…cfc9)","level":30,"hostname":"nirneth-arnoediad","pid":51184,"time":"2023-04-01T19:33:02.189878651Z","target":"log","line":null,"file":null,"log.target":"substrate","log.line":139,"log.file":"/home/runner/.cargo/git/checkouts/substrate-7bc20b373ca3e834/456cfad/client/informant/src/lib.rs","log.module_path":"sc_informant"}
{"v":0,"name":"subspace-cli","msg":"💤 Idle (25 peers), best: #49562 (0x2b9e…cfc9), finalized #49462 (0xeecf…15eb), ⬇ 686.6kiB/s ⬆ 948.8kiB/s","level":30,"hostname":"nirneth-arnoediad","pid":51184,"time":"2023-04-01T19:33:07.128893663Z","target":"log","line":null,"file":null,"log.target":"substrate","log.file":"/home/runner/.cargo/git/checkouts/substrate-7bc20b373ca3e834/456cfad/client/informant/src/display.rs","log.line":135,"log.module_path":"sc_informant::display"}
{"v":0,"name":"subspace-cli","msg":"💤 Idle (25 peers), best: #49562 (0x2b9e…cfc9), finalized #49462 (0xeecf…15eb), ⬇ 439.8kiB/s ⬆ 771.3kiB/s","level":30,"hostname":"nirneth-arnoediad","pid":51184,"time":"2023-04-01T19:33:12.129268497Z","target":"log","line":null,"file":null,"log.file":"/home/runner/.cargo/git/checkouts/substrate-7bc20b373ca3e834/456cfad/client/informant/src/display.rs","log.module_path":"sc_informant::display","log.line":135,"log.target":"substrate"}
{"v":0,"name":"subspace-cli","msg":"✨ Imported #49563 (0x505d…3200)","level":30,"hostname":"nirneth-arnoediad","pid":51184,"time":"2023-04-01T19:33:13.397542068Z","target":"log","line":null,"file":null,"log.line":139,"log.target":"substrate","log.module_path":"sc_informant","log.file":"/home/runner/.cargo/git/checkouts/substrate-7bc20b373ca3e834/456cfad/client/informant/src/lib.rs"}
{"v":0,"name":"subspace-cli","msg":"💔 Error importing block 0x5f284bcfc01fa75bdbc37127b4284912ae27bfef7cf1518a4d8f1c3e96379f0a: block has an unknown parent","level":40,"hostname":"nirneth-arnoediad","pid":51184,"time":"2023-04-01T19:33:13.93055837Z","target":"log","line":null,"file":null,"log.target":"sync","log.line":2866,"log.file":"/home/runner/.cargo/git/checkouts/substrate-7bc20b373ca3e834/456cfad/client/network/sync/src/lib.rs","log.module_path":"sc_network_sync"}
{"v":0,"name":"subspace-cli","msg":"Unsupported Offchain Worker API version: (Err(Application(RuntimeCode(\"`:code` hash not found\"))), Err(Application(RuntimeCode(\"`:code` hash not found\")))). Consider turning off offchain workers if they are not part of your runtime..","level":50,"hostname":"nirneth-arnoediad","pid":51184,"time":"2023-04-01T19:33:14.087504123Z","target":"offchain-worker","line":132,"file":"/home/runner/.cargo/git/checkouts/substrate-7bc20b373ca3e834/456cfad/client/offchain/src/lib.rs"}
{"v":0,"name":"subspace-cli","msg":"♻️ Reorg on #49563,0x505d…3200 to #49563,0x6e29…4dc8, common ancestor #49562,0x2b9e…cfc9","level":30,"hostname":"nirneth-arnoediad","pid":51184,"time":"2023-04-01T19:33:14.087519796Z","target":"log","line":null,"file":null,"log.target":"sc_informant","log.line":111,"log.file":"/home/runner/.cargo/git/checkouts/substrate-7bc20b373ca3e834/456cfad/client/informant/src/lib.rs","log.module_path":"sc_informant"}
{"v":0,"name":"subspace-cli","msg":"Failed to update recent stored bundles for bundle-validator","level":50,"hostname":"nirneth-arnoediad","pid":51184,"time":"2023-04-01T19:33:14.087521428Z","target":"subspace_transaction_pool::bundle_validator","line":278,"file":"/home/runner/.cargo/git/checkouts/subspace-5c1447fb849a5554/c011de1/crates/subspace-transaction-pool/src/bundle_validator.rs","err":"Error at calling runtime api: Runtime code error: `:code` hash not found"}
{"v":0,"name":"subspace-cli","msg":"✨ Imported #49563 (0x6e29…4dc8)","level":30,"hostname":"nirneth-arnoediad","pid":51184,"time":"2023-04-01T19:33:14.087571029Z","target":"log","line":null,"file":null,"log.target":"substrate","log.module_path":"sc_informant","log.file":"/home/runner/.cargo/git/checkouts/substrate-7bc20b373ca3e834/456cfad/client/informant/src/lib.rs","log.line":139}
{"v":0,"name":"subspace-cli","msg":"💔 Verification failed for block 0x5f284bcfc01fa75bdbc37127b4284912ae27bfef7cf1518a4d8f1c3e96379f0a received from peer: 12D3KooWKCedH2o9SKu3BvWTB98zPp7EQxg2NmMCrHyoGCHnsJga, \"Runtime code error: `:code` hash not found\"","level":40,"hostname":"nirneth-arnoediad","pid":51184,"time":"2023-04-01T19:33:14.111763893Z","target":"log","line":null,"file":null,"log.line":2839,"log.module_path":"sc_network_sync","log.target":"sync","log.file":"/home/runner/.cargo/git/checkouts/substrate-7bc20b373ca3e834/456cfad/client/network/sync/src/lib.rs"}
So immediatelly after import there's another block coming that fails with "block has an unknown parent" and later the same block fails with a different error ":code` hash not found which means that the block header is there, but the state root is not.
I suspect there's some kind of race condition here, although I'm not entirely sure it is in the database.
I could not reproduce it, but Im fairly certain this is a substrate issue. There's probably a bug in sync. When the block is downloaded and the parent block state is missing, it gets imported without enacting the state. Normally this may happen for stale blocks requested by GRANDPA, but here it looks like this is sometimes happening for new blocks as well. For an archive node it manifests as ":code hash not found". For a regular node there's a different check, so the same bug would result in a "State already discarded" error.
It needs reproducing with -l db=debug
- this will show if the block was indeed committed with no state. -l sync=trace
would also be useful.
I'll try to reproduce and will create Substrate issue when I have more data.
Thanks for the hints, appreciate it!