kaspanet/rusty-kaspa

testnet11 DbError Corruption:block checksum mismatch

Closed this issue · 3 comments

Describe the bug
testnet11 DbError Corruption:block checksum mismatch

To Reproduce
Steps to reproduce the behavior:
cargo run --bin kaspad --release -- --testnet --netsuffix=11 --utxoindex

Screenshots
image

Desktop

  • OS: Ubuntu 20.04
  • Kaspad version: 0.14.1

Additional context
I tried deleting ~/.rust-kaspa/kaspa-testnet-11 and the same thing happened.
There will be no errors when running mainnet.

It sounds like you have a hardware problem that exhibits itself under pressure / hardware stress. tn11 is very demanding on IO, while currently the mainnet isn’t. I have previously experienced this on physical hardware (glitchy IO chipset) and on virtualization platforms like VirtualBox. Hardware problems like RAM corruption and IO are typically exhibited under high utilization.

Actual db corruptions can occur (extremely rarely as a consequence of an ungraceful application termination or an ungraceful system restart). No other users are reporting this so the issue most likely lays with IO in your system (that you otherwise don’t see unless you heavily stress random writes).

It sounds like you have a hardware problem that exhibits itself under pressure / hardware stress. tn11 is very demanding on IO, while currently the mainnet isn’t. I have previously experienced this on physical hardware (glitchy IO chipset) and on virtualization platforms like VirtualBox. Hardware problems like RAM corruption and IO are typically exhibited under high utilization.

Actual db corruptions can occur (extremely rarely as a consequence of an ungraceful application termination or an ungraceful system restart). No other users are reporting this so the issue most likely lays with IO in your system (that you otherwise don’t see unless you heavily stress random writes).

This system uses an M2 SSD, it is not a virtual machine, and it is a newly installed system. After installation, the only program running is the KAS node. This is the only program where I encounter errors in the KAS node, and I am quite puzzled by it.

You need to try some sort of an IO stress test. Especially if this is new system. I am 99% certain this is due to high IO throughout.

Going to close this if you don’t mind since as I mentioned, you are the only person experiencing this. Feel free to hop on Discord in #development to get further feedback from different devs.