arduino/Arduino

Nano 33 IoT server hangs OR stops making connection with client

Closed this issue · 3 comments

Hi
I'm posting this bug report to Arduino/Arduino because I'm not too sure where to post it. Thanks for moving it to where it belongs.

Since little less than a month I'm playing with Arduino Nano 33 IoT boards.
I've written a pair of testing sketches, one for a Nano 33 IoT acting as TCP server and one as TCP client.
After three weeks of extensive testing I have to conclude that there are issues (perceived bugs) at the server side.
I have documented these issues in an excel (attached), one issue (perceived bug) per excel sheet.
The source code is attached as two TXT files (as INO files cannot be uploaded). I did not include the source code within the text of this issue report because of the length. Could not shorten because these sketches have been made specifically to run these tests and that's all they do.

The issues are:

  1. Server-side board hangs indefinitely: it does not return after a call to method "client.connected()".
    This can occur minutes or hours after starting up. Resetting the server-side board solves the issue.
    Client behaves correctly.
    See evidence in excel workbook, sheet "hanging".

  2. Server cannot make connection with client any more (it keeps trying), while client reports it is connected. Although the software keeps running, connection is not possible any more (indefinitely) until the server-side board is reset.
    This can occur minutes or hours after starting up. Client behaves correctly.
    See excel workbook, sheet "no connection".

  3. The client-side test program sends "client requests" (7-digit numbers including CR and LF characters) to the server (which simply sends them back to the client as "server responses"). The server receives the number digits correctly, but sometimes the CR and LF (carriage return line feed) characters are missing. This is happening once every 100 to 300 connection cycles.
    See excel workbook, sheet "data error".

About the sketches ('TCP server.ino' and 'TCP client.ino', both attached to this issue report but with a TXT extension):
Both programs function as a state machine, with a variable 'connectionState' controlling execution.
WiFi connection and client connection state is constantly checked. There are no while loops, no calls to 'delay()' etc. All timeouts controlled by timer.

Server-side states:

enum ConnectionState_type {
    conn_0_wifiConnectNow,                          // attempt to connect wifi
    conn_1_wifiDelayConnection,                     // waiting for next attempt to connect wifi (after timeout)
    conn_2_wifiConnected,                           // attempt to connect to client
    conn_3_clientDelayConnection,                   // wait for next attempt to connect to client (after timeout)
    conn_4_clientConnected,                         // server connected to client (waiting for client request)
    conn_5_requestReceived,                         // client request received: send server response
    conn_6_stopClientNow,                           // stop client connection
    conn_7_report                                   // send info about last connection cycle to serial monitor
};

Client-side states are similar:

enum ConnectionState_type {
    conn_0_wifiConnectNow,                          // attempt to connect wifi
    conn_1_wifiDelayConnection,                     // waiting for next attempt to connect wifi (after timeout)
    conn_2_wifiConnected,                           // attempt to connect to server
    conn_3_clientDelayConnection,                   // waiting for next attempt to connect to server (after timeout)
    conn_4_clientConnected,                         // client connected to server (may send request to server)
    conn_5_requestSent,                             // client request sent to server: wait for server response
    conn_6_stopClientNow,                           // stop connection to server
    conn_7_report                                   // send info about last connection cycle to serial monitor
};

Server-side main loop (client side is similar):

void loop() {
    // code contains no calls to 'delay()', all delays and timeouts controlled by timer

    // variable 'connectionState' controls proper sequencing of tasks in these procedures:
    connectToWiFi();                 // if currently not connected to wifi (or connection lost): connect
    connectToClient();               // if a client is available but not connected: connect
    assembleClientRequest();         // read one incoming character of client request, if available
    sendResponseToClient();          // when client request is complete, send response to client
    stopTCPconnection();             // stop connection to client (when client side disconnected or after time out)
    lastConnectionReport();          // send info about last connection cycle to serial monitor

    // not controlled by state but by time
    heartbeat();                     // 1 second heartbeat: print current connection state
}

Both server and client print status information to the serial port (see excel workbook, attached).
In addition, the server-side code outputs signals on pins 10 to 12 that are displayed on an oscilloscope, enabling to see what is actually going wrong and when (screen dumps attached in the excel workbook).

Normal flow:

  • client connects
  • server notices client and connects
  • client sends a 'client request' (a 7-digit number) followed by CRLF
  • server reads request and echoes it as a 'server response' to the client
  • client stops
  • server detects client stopped and stops as well

Time outs:

  • server and client time out if data is not received within 10 seconds
  • these time outs can be modified (defined as constants)

Delays:

  • if a WiFi connection attempt fails, next attempt will happen after a delay (using the state machine logic)
  • server side: after each client.stop(), a small delay is applied before next connection attempt (using the state machine logic)
  • client side: a small delay is applied between two connection attempts OR between a client.stop() and a next connection attempt
  • delay timings can be modified (defned as constants)

Client connection delay is quite short, but increasing it does not solve the issues.

A few other observations, as side info:

  • server and client recover well from WIFI signal loss
  • in the scenario used in this test, it's almost always the server experiencing WiFi signal loss. Even if the boards are switched
  • I tested with two WiFi routers - other equipment (e.g. smartphone, PC) connected does not suffer from WiFi signal loss
  • the client-side did not experience the 'system hanging' nor the 'not able to connect' situation. It seems to be the server-side having these problems

Although I have invested many (many !) hours in fine tuning these sketches and conducting these tests, I can (of course) not totally exclude that I made any errors myself.
But at this time I am quite convinced that there are issue with the nano 33 IoT, especially when working as a TCP server.

Feel free to contact me for more info, or if I would have forgotten to mention anything.
Best regards and thanks

nano 33 iot-TCP client server-server issues.xlsx
TCP client.txt
TCP server.txt

Was there any further investigation or input on this? I also did experience stability issues with Nano 33 IoTs functioning as simple web servers late last year, but I assumed those issues were due to some poor code I had written that was causing problems I couldn't chase down ..

Hello
May I also politely insist on tackling this issue ? Many thanks

After installing the latest WiFiNina firmware (version 1.4.8), issues seem to have disappeared.
So, 6 months after having spent a lot of time documenting these issues, and without any reaction received during this period, I'm happy to be able to close this issue now.
I wil create a new issue if issues would again popup. Regards