Molorius/esp32-websocket

Core panic'ed on wifi disconnection or reconnection (with changed status)

tmedicci opened this issue · 3 comments

If a client has connected to websocket server and wi-fi connection is disconnected for some reason (SSID has changed, for example) or if a client has connected and connection with client is lost (cable unplugged from client, for example), when it comes back, the system tries to retrieve sent messages and send it to cliente. However, if client has changed its status (has closed the connection), the core panics.

Considering the 2nd case:

  • netconn_write at ws_send blocks when client connection is lost (cable unplugged).
  • message is not sent to client. Client, still disconnected, closes the connection.
  • when client is plugged, the system unblocks netconn_write. However, client will not respond.
  • the problem happens in ws_is_connected:
int ws_server_send_text_all_from_callback(char* msg,uint64_t len) {
  int ret = 0;
  for(int i=0;i<WEBSOCKET_SERVER_MAX_CLIENTS;i++) {
    if(ws_is_connected(clients[i])) {
      ws_send(&clients[i],WEBSOCKET_OPCODE_TEXT,msg,len,0);
      if(ws_is_connected(clients[i])) ret += 1;
      else {
        clients[i].scallback(i,WEBSOCKET_DISCONNECT_ERROR,NULL,0);
        ws_disconnect_client(&clients[i], 0);
      }
    }
  }
  return ret;
}

Core panic happens in the seconde ws_is_connected in the above code. It seems to be that client.conn->pcb.tcp->state is not available to be read. In fact, state isn't changed as client is unplugged.

  • Here it is the following dump:
Guru Meditation Error: Core  1 panic'ed (LoadProhibited). Exception was unhandled.
Core 1 register dump:
PC      : 0x40150426  PS      : 0x00060630  A0      : 0x8015033a  A1      : 0x3ffd86e0  
0x40150426: ws_is_connected at /media/development/5EC2199AC219780B/Documentos/Projetos/SmartPlug/Software/ESP-IDF/esp-idf_aplic_comm/components/websocket/websocket.c:51 (discriminator 1)

A2      : 0x3ffbcae4  A3      : 0x3ffb8a60  A4      : 0x00000038  A5      : 0x3ffd8738  
A6      : 0x400e49c4  A7      : 0x3ffd6300  A8      : 0x00000000  A9      : 0x3ffd8690  
0x400e49c4: websocket_callback at /media/development/5EC2199AC219780B/Documentos/Projetos/SmartPlug/Software/ESP-IDF/esp-idf_aplic_comm/components/NETWORK/WEBSOCKET.c:72

A10     : 0x400dd5ac  A11     : 0x3ffb4260  A12     : 0x3f42001c  A13     : 0x0001bd05  
0x400dd5ac: vprintf at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdio/../../../.././newlib/libc/stdio/vprintf.c:35

A14     : 0x00000001  A15     : 0x00000005  SAR     : 0x00000004  EXCCAUSE: 0x0000001c  
EXCVADDR: 0x00000038  LBEG    : 0x400014fd  LEND    : 0x4000150d  LCOUNT  : 0xfffffffd  

Backtrace: 0x40150426:0x3ffd86e0 0x40150337:0x3ffd8700 0x401503a9:0x3ffd8770 0x400e4c72:0x3ffd8790
0x40150426: ws_is_connected at /media/development/5EC2199AC219780B/Documentos/Projetos/SmartPlug/Software/ESP-IDF/esp-idf_aplic_comm/components/websocket/websocket.c:51 (discriminator 1)

0x40150337: ws_server_send_text_all_from_callback at /media/development/5EC2199AC219780B/Documentos/Projetos/SmartPlug/Software/ESP-IDF/esp-idf_aplic_comm/components/websocket/websocket_server.c:339 (discriminator 2)

0x401503a9: ws_server_send_text_all at /media/development/5EC2199AC219780B/Documentos/Projetos/SmartPlug/Software/ESP-IDF/esp-idf_aplic_comm/components/websocket/websocket_server.c:295

0x400e4c72: websocket_meter_readings_handler_task at /media/development/5EC2199AC219780B/Documentos/Projetos/SmartPlug/Software/ESP-IDF/esp-idf_aplic_comm/components/NETWORK/WEBSOCKET.c:337 (discriminator 9)

Is there anyway to check client state before writing to it?

You cannot check the client state before writing to it, this is a limitation of lwip. I have pushed a potential fix, could you test it for me? I do not have an esp32 easily available at the moment.

Hello @Molorius , thank you for your answer. I was checking netconn_evt (event) for connection close. However, the application still tried to check internal state. Checking netconn return solves most of the problem. However, it did not solve when STA interface is disconnected: it is necessary to remove all clients in the wifi_event_handler, but ws_is_connected is called before anything being sent to conn and device panics.

The solution in both situation was not to check internal state: as errors are identified when called netconn api, ws_is_connected can only identify if some client is allocated. I have suggested a PR #7 with this simple correction. It worked well for me.

It is important to not forget to remove all clients when device got disconnected from STA mode.

Pull Request #7 has been merged, I'm closing this for now.