stakwork/sphinx-key

`SENDING TO` unending loop

Closed this issue · 5 comments

In case broker sends a message, and signer crashes before returning a response, signer will reconnect with broker under a new client id, but broker is currently stuck in an unending loop sending that same message to the previous client id.

ok so we need to choose a number of retries the broker will do before giving up on a client and trying a new one. 3 times? 10 times?

Here new clients are added to the beginning of the clients array, so the next try around, the broker should be trying to newest client https://github.com/stakwork/sphinx-key/blob/master/broker/src/conn.rs#L25

Are you sure you saw this on latest broker? and that the new client connected fine?

@Evanfeenstra just reproduced on d61bf9f

Here are the logs:

2023-07-03T23:04:19.204Z INFO    lightningd: --------------------------------------------------
2023-07-03T23:04:19.204Z INFO    lightningd: Server started with public key 020af1db437944964b6ea1b3786e2ed175d18efc8c484deea1c988ef3c73262eaa, alias ORANGERAGE-v23.05.1-55-gb199905 (color #020af1) and lightningd v23.05.1-55-gb199905
2023-07-03T23:04:19.602Z UNUSUAL plugin-bookkeeper: Snapshot balance does not equal ondisk reported 0msat, off by (+0msat/-0msat) (account wallet) Logging journal entry.
[2023-07-03T23:04:25.572 hsmd  /sphinx_key_broker::mqtt INFO] SENDING TO ILZL7wvL on topic sphinx-control
[2023-07-03T23:04:30.488 hsmd  /sphinx_key_broker::mqtt INFO] SENDING TO ILZL7wvL on topic sphinx-control
[2023-07-03T23:04:39.488 hsmd  /sphinx_key_broker::mqtt INFO] SENDING TO ILZL7wvL on topic sphinx-control
[2023-07-03T23:04:39.598 hsmd  /rumqttd::server::broker ERROR] Disconnected!! error=Network(Io(Os { code: 104, kind: ConnectionReset, message: "Connection reset by peer" }))
[2023-07-03T23:04:39.599 hsmd  /sphinx_key_broker::mqtt INFO] Alert: (0, Event("ILZL7wvL", Disconnect))
[2023-07-03T23:04:39.599 hsmd  /sphinx_key_broker INFO] => reconnected: ILZL7wvL: false
[2023-07-03T23:04:41.366 hsmd  /rumqttd::server::broker ERROR] remote_link; tenant_id=None
[2023-07-03T23:04:41.367 hsmd  /sphinx_key_broker::mqtt INFO] Alert: (0, Event("aKkJ7eZC", Connect))
[2023-07-03T23:04:41.367 hsmd  /sphinx_key_broker INFO] => reconnected: aKkJ7eZC: true
[2023-07-03T23:04:41.367 hsmd  /sphinx_key_broker::lss INFO] CLIENT aKkJ7eZC reconnected!
[2023-07-03T23:04:44.371 hsmd  /sphinx_key_broker::mqtt INFO] SENDING TO aKkJ7eZC on topic sphinx-init-msg
[2023-07-03T23:04:45.464 hsmd  /sphinx_key_broker::mqtt INFO] SENDING TO aKkJ7eZC on topic sphinx-init-msg
[2023-07-03T23:04:48.489 hsmd  /sphinx_key_broker::mqtt INFO] SENDING TO ILZL7wvL on topic sphinx-control
[2023-07-03T23:04:49.948 hsmd  /sphinx_key_broker::looper INFO] SEND ON sphinx
[2023-07-03T23:04:57.489 hsmd  /sphinx_key_broker::mqtt INFO] SENDING TO ILZL7wvL on topic sphinx-control
[2023-07-03T23:05:06.489 hsmd  /sphinx_key_broker::mqtt INFO] SENDING TO ILZL7wvL on topic sphinx-control
[2023-07-03T23:05:15.489 hsmd  /sphinx_key_broker::mqtt INFO] SENDING TO ILZL7wvL on topic sphinx-control
[2023-07-03T23:05:24.489 hsmd  /sphinx_key_broker::mqtt INFO] SENDING TO ILZL7wvL on topic sphinx-control
[2023-07-03T23:05:33.490 hsmd  /sphinx_key_broker::mqtt INFO] SENDING TO ILZL7wvL on topic sphinx-control
[2023-07-03T23:05:42.490 hsmd  /sphinx_key_broker::mqtt INFO] SENDING TO ILZL7wvL on topic sphinx-control
[2023-07-03T23:05:51.490 hsmd  /sphinx_key_broker::mqtt INFO] SENDING TO ILZL7wvL on topic sphinx-control
[2023-07-03T23:06:00.490 hsmd  /sphinx_key_broker::mqtt INFO] SENDING TO ILZL7wvL on topic sphinx-control
[2023-07-03T23:06:09.491 hsmd  /sphinx_key_broker::mqtt INFO] SENDING TO ILZL7wvL on topic sphinx-control

fixed in master (control messages fail right away)

Fixed with commit 18e6b7a