stakwork/sphinx-key

E (381850) TRANSPORT_BASE: poll_read select error 104, errno = Connection reset by peer, fd = 54

Closed this issue · 4 comments

This is an intermittent error. But it should be reproducible in less than 5 minutes.

To reproduce:
Set up a channel between alice (normal node) and bob (remote signer node).
Send ~10 keysends in a loop from bob to alice, and you will eventually get these logs on the signer side:

I (16510) sphinx_key::core::events: => starting the main signing loop...                                                                     
I (23870) lightning_signer::node: 02e7 adding payment 4c403adc53b8a7f1ef7a7d82c0db4c79a90a13d1895b0ec7aa455db1e230cb14 -> 1000               
I (150710) lightning_signer::node: 02e7 adding payment 11f8be10d081a434410385ee1fe11797f6678b9ce108ef5e97bf3480b22598f0 -> 1000              
I (155970) lightning_signer::node: 02e7 adding payment 9fd2fa7307dde80a792fa9ce19765edae0877c6ee0fdaa2492d78a6fffa84fcf -> 1000              
I (161090) lightning_signer::node: 02e7 adding payment 58d30b2de3b73636a5b9f887054d970072e5786339bb9d4df4fd1f1853f91f32 -> 1000              
I (185870) lightning_signer::node: 02e7 adding payment 611de4aa05b070ce2c086e9056e13b031f9e90e2844e2002ae69a90fbec42d30 -> 1000              
I (194260) lightning_signer::node: 02e7 adding payment 52b59ac6af9e54c33fbc6d03cfdeab2b8ed45a111f029e55d70c37a1185a4b07 -> 1000              
I (200610) lightning_signer::node: 02e7 adding payment 1a70363b588a1663ea030de8ab04ebb1e31257301e6da797d9d7a2117f247424 -> 1000              
I (224370) lightning_signer::node: 02e7 adding payment 7b9229746eae58ee85ea25897cfc88688b3e7cbf48150983b42860bc6ab40f92 -> 1000              
I (229290) lightning_signer::node: 02e7 adding payment 85bd0ad72d1acf2ad12fd9d3e60c63d6aa935b255ed46d1ca6aea2053d6744c8 -> 1000              
I (235640) lightning_signer::node: 02e7 adding payment 10a87920743b52ee54052ded7670374d5ad5e4396610c66cc3c597226acc119e -> 1000              
I (377150) lightning_signer::node: 02e7 adding payment fa544eb8313195048b104b07bdb4cbf2a2e9f8c6fbf15470e1300ba596f092b1 -> 1000              
I (381450) lightning_signer::node: 02e7 adding payment 6bafccba49567a3d4144d959ec03d9b7865c68407b007ecb392ec1e80eb77e33 -> 1000              
E (381850) TRANSPORT_BASE: poll_read select error 104, errno = Connection reset by peer, fd = 54                                             
E (381850) MQTT_CLIENT: Poll read error: 119, aborting connection                                                                            
E (381860) TRANSPORT_BASE: poll_write select error 0, errno = Success, fd = 54                                                               
W (381870) TRANSPORT_BASE: Poll timeout or error, errno=Success, fd=54, timeout_ms=10000                                                     
E (381880) MQTT_CLIENT: Writing failed: errno=0                                                                                              
E (381880) sphinx_key::conn::mqtt: ESP_FAIL msg!                                                                                             
W (381890) sphinx_key::conn::mqtt: RECEIVED Disconnected MESSAGE                                                                             
W (381890) MQTT_CLIENT: Publish: Losing qos0 data when client not connected                                                                  
W (381900) sphinx_key::conn::mqtt: RECEIVED Disconnected MESSAGE                                                                             
Guru Meditation Error: Core  0 panic'ed (Illegal instruction). Exception was unhandled.

@Evanfeenstra pretty sure the reason this error happens is because of the following complaint from rumqttd. Seems like this is the part that's complaining first. Then we get the problem on the signer side described above.

[2023-06-14T16:15:37.436 hsmd  /sphinx_key_broker::looper DEBUG] SEND ON sphinx                                                                                                                                     
[2023-06-14T16:15:37.436 hsmd  /sphinx_key_broker::mqtt DEBUG] SENDING TO F96yex3J on topic sphinx                                                                                                                  
[2023-06-14T16:15:37.885 hsmd  /sphinx_key_broker::looper DEBUG] GOT ON sphinx-return                     
[2023-06-14T16:15:37.889 hsmd  /sphinx_key_broker::looper DEBUG] SEND ON sphinx                                                                                                                                     
[2023-06-14T16:15:37.890 hsmd  /sphinx_key_broker::mqtt DEBUG] SENDING TO F96yex3J on topic sphinx                                                                                                                  
[2023-06-14T16:15:38.286 hsmd  /rumqttd::server::broker ERROR] Disconnected!! error=Network(Protocol(PayloadSizeLimitExceeded(5261)))

Check out this issue in esp-idf espressif/esp-idf#10000

Looks like its fixed in esp-idf > v5. So we would need to update our ESP-IDF-SYS to 0.33.1

@Evanfeenstra don't really see how that issue relates to this one ? Happy to try and see if it solves the problem though :)

For now this is fixed by:
78f3661