iqlusioninc/tmkms

Privval protocol incompatibility with namada node

Fraccaman opened this issue · 5 comments

Im using tmkms v0.13.0 with a namada node (which is using cometbft 0.37.0 under the hood).
Running tmkms I get the following error:

2023-12-05T11:18:56.874145Z  INFO tmkms::commands::start: tmkms 0.13.0 starting up...
2023-12-05T11:18:56.876290Z  INFO tmkms::keyring: [keyring:softsign] added consensus Ed25519 key: tnam1zcjduepqe09qfp0stw7ryce4ch8r83frt9qmhseg4t658xrj74mlmx77mspqh20rn7
2023-12-05T11:19:00.890926Z  INFO tmkms::connection::tcp: KMS node ID: fce99ce38f090a9982199c33c88052f154f8a2d3
2023-12-05T11:19:00.894721Z ERROR tmkms::client: [local.e8700a77c10bbc3e43a00-0@tcp://bf00f74e5a29a89322412aa2cd44af1073ad0759@127.0.0.1:27658] protocol error:
   0: io error
   1: failed to fill whole buffer

And then the namada full node crashes, probably trying to deserialize some protobuf encoded message as I see these kind of message on the full node:

The application panicked (crashed).
Message:  called `Result::unwrap()` on an `Err` value: DecodeError { description: "invalid wire type value: 6", stack: [("Request", "value")] }

or

The application panicked (crashed).
Message:  called `Result::unwrap()` on an `Err` value: DecodeError { description: "unexpected end group tag", stack: [("Request", "value")] }

Do you know if tmkms is compatible with cometbft 0.37.0? if not, is this planned?

No idea what's happening here.

I would recommend avoiding leaping to conclusions like some outright incompatibility between CometBFT and TMKMS (or rather, the tendermint-p2p crate TMKMS uses), until such time as you can find an actual change to the CometBFT code which would cause this, or demonstrate the incompatibility with more than one CometBFT v0.37 chain. Many of these issues wind up being somewhat chain-specific. See also #729.

These issues are unfortunately painful to reproduce without us setting up an entire testnet node for every chain which experiences them, so I'd appreciate your patience in trying to narrow it down without us having to do that. Ideally if you could provide access to a preconfigured node we could debug which has a full development environment for Rust and the TMKMS source code, that would be helpful.

"invalid wire type value: 6"

FWIW, this isn't necessarily due to a schema mismatch. 6 is not a valid wiretype whatsoever (the allowed values are 0-5): https://protobuf.dev/programming-guides/encoding/ (Edit: I guess it could be caused by a mismatch of a prior wiretype misinterpring another field)

I also notice the error looks like a Rust panic, so it's coming from the Namada node, not CometBFT. Any idea where that's happening or what Protobuf it's trying to parse? Perhaps use RUST_BACKTRACE=1?

Also, any idea when Namada will upgrade to CometBFT v0.38?

Please try TMKMS v0.14.0-pre.1