Core panic'ed on wifi disconnection or reconnection (with changed status)
tmedicci opened this issue · 3 comments
If a client has connected to websocket server and wi-fi connection is disconnected for some reason (SSID has changed, for example) or if a client has connected and connection with client is lost (cable unplugged from client, for example), when it comes back, the system tries to retrieve sent messages and send it to cliente. However, if client has changed its status (has closed the connection), the core panics.
Considering the 2nd case:
- netconn_write at ws_send blocks when client connection is lost (cable unplugged).
- message is not sent to client. Client, still disconnected, closes the connection.
- when client is plugged, the system unblocks netconn_write. However, client will not respond.
- the problem happens in ws_is_connected:
int ws_server_send_text_all_from_callback(char* msg,uint64_t len) {
int ret = 0;
for(int i=0;i<WEBSOCKET_SERVER_MAX_CLIENTS;i++) {
if(ws_is_connected(clients[i])) {
ws_send(&clients[i],WEBSOCKET_OPCODE_TEXT,msg,len,0);
if(ws_is_connected(clients[i])) ret += 1;
else {
clients[i].scallback(i,WEBSOCKET_DISCONNECT_ERROR,NULL,0);
ws_disconnect_client(&clients[i], 0);
}
}
}
return ret;
}
Core panic happens in the seconde ws_is_connected in the above code. It seems to be that client.conn->pcb.tcp->state is not available to be read. In fact, state isn't changed as client is unplugged.
- Here it is the following dump:
Guru Meditation Error: Core 1 panic'ed (LoadProhibited). Exception was unhandled.
Core 1 register dump:
PC : 0x40150426 PS : 0x00060630 A0 : 0x8015033a A1 : 0x3ffd86e0
0x40150426: ws_is_connected at /media/development/5EC2199AC219780B/Documentos/Projetos/SmartPlug/Software/ESP-IDF/esp-idf_aplic_comm/components/websocket/websocket.c:51 (discriminator 1)
A2 : 0x3ffbcae4 A3 : 0x3ffb8a60 A4 : 0x00000038 A5 : 0x3ffd8738
A6 : 0x400e49c4 A7 : 0x3ffd6300 A8 : 0x00000000 A9 : 0x3ffd8690
0x400e49c4: websocket_callback at /media/development/5EC2199AC219780B/Documentos/Projetos/SmartPlug/Software/ESP-IDF/esp-idf_aplic_comm/components/NETWORK/WEBSOCKET.c:72
A10 : 0x400dd5ac A11 : 0x3ffb4260 A12 : 0x3f42001c A13 : 0x0001bd05
0x400dd5ac: vprintf at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdio/../../../.././newlib/libc/stdio/vprintf.c:35
A14 : 0x00000001 A15 : 0x00000005 SAR : 0x00000004 EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000038 LBEG : 0x400014fd LEND : 0x4000150d LCOUNT : 0xfffffffd
Backtrace: 0x40150426:0x3ffd86e0 0x40150337:0x3ffd8700 0x401503a9:0x3ffd8770 0x400e4c72:0x3ffd8790
0x40150426: ws_is_connected at /media/development/5EC2199AC219780B/Documentos/Projetos/SmartPlug/Software/ESP-IDF/esp-idf_aplic_comm/components/websocket/websocket.c:51 (discriminator 1)
0x40150337: ws_server_send_text_all_from_callback at /media/development/5EC2199AC219780B/Documentos/Projetos/SmartPlug/Software/ESP-IDF/esp-idf_aplic_comm/components/websocket/websocket_server.c:339 (discriminator 2)
0x401503a9: ws_server_send_text_all at /media/development/5EC2199AC219780B/Documentos/Projetos/SmartPlug/Software/ESP-IDF/esp-idf_aplic_comm/components/websocket/websocket_server.c:295
0x400e4c72: websocket_meter_readings_handler_task at /media/development/5EC2199AC219780B/Documentos/Projetos/SmartPlug/Software/ESP-IDF/esp-idf_aplic_comm/components/NETWORK/WEBSOCKET.c:337 (discriminator 9)
Is there anyway to check client state before writing to it?
You cannot check the client state before writing to it, this is a limitation of lwip. I have pushed a potential fix, could you test it for me? I do not have an esp32 easily available at the moment.
Hello @Molorius , thank you for your answer. I was checking netconn_evt (event) for connection close. However, the application still tried to check internal state. Checking netconn return solves most of the problem. However, it did not solve when STA interface is disconnected: it is necessary to remove all clients in the wifi_event_handler, but ws_is_connected is called before anything being sent to conn and device panics.
The solution in both situation was not to check internal state: as errors are identified when called netconn api, ws_is_connected can only identify if some client is allocated. I have suggested a PR #7 with this simple correction. It worked well for me.
It is important to not forget to remove all clients when device got disconnected from STA mode.