vjmuzik/NativeEthernet

Connection timeout and reset problems with the WebServer example

lasselukkari opened this issue · 5 comments

First of all thanks for your hard work!

When I run the basic WebServer example I start to get connection timeouts and resets if I push server even a bit.

I'm using Apache Benchmark to generate the load: ab -k -c 1 -n 20 http://192.168.1.177/

If I keep the -n small things work as expected. With values bigger than 100 I will most of the time get either a connection reset from the server with apr_socket_recv: Connection reset by peer (54) or the client times out after a while apr_pollset_poll: The timeout specified has expired (70007). When the Apache Benchmark is waiting for the response that never comes but has not timed out yet the website loads just fine I create a new connection for example using a browser.

I have tested pretty much the same code with Arduino Uno and the ESP8266 and ESP32 chips and I see no problems. This is why expect the problem not to be in my test setup. I also just updated the lib to the latest master. I also have the latest version of your FNET fork. My Teensyduino version is 1.5.3. I can also reproduce the same behaviour with other load testing tools. I have not tried another computer, network or ethernet chip or Teensy.

Adding client.close() before client.stop() did not improve the situation.

Have a nice weekend!

Here is a wireshark capture including the the last few successful requests before the reset:

No. Time Source Destination Protocol Length Info
3537 20.282542 192.168.1.101 192.168.1.177 TCP 78 51221 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 WS=64 TSval=1108197721 TSecr=0 SACK_PERM=1
3538 20.284055 192.168.1.177 192.168.1.101 TCP 60 80 > 51220 [ACK] Seq=277 Ack=107 Win=1943 Len=0
3539 20.284320 192.168.1.177 192.168.1.101 TCP 62 80 > 51221 [SYN, ACK] Seq=0 Ack=1 Win=2048 Len=0 MSS=1460 WS=1
3540 20.284399 192.168.1.101 192.168.1.177 TCP 54 51221 > 80 [ACK] Seq=1 Ack=1 Win=262144 Len=0
3541 20.284462 192.168.1.101 192.168.1.177 HTTP 159 GET / HTTP/1.0
3542 20.286025 192.168.1.177 192.168.1.101 TCP 60 80 > 51221 [ACK] Seq=1 Ack=106 Win=1943 Len=0
3543 20.684843 192.168.1.177 192.168.1.101 TCP 69 [TCP segment of a reassembled PDU]
3544 20.684931 192.168.1.101 192.168.1.177 TCP 54 51221 > 80 [ACK] Seq=106 Ack=16 Win=262080 Len=0
3545 20.686272 192.168.1.177 192.168.1.101 HTTP 314 HTTP/1.1 200 OK (text/html)
3546 20.686367 192.168.1.101 192.168.1.177 TCP 54 51221 > 80 [ACK] Seq=106 Ack=277 Win=261824 Len=0
3547 20.686431 192.168.1.101 192.168.1.177 TCP 54 51221 > 80 [FIN, ACK] Seq=106 Ack=277 Win=262144 Len=0
3548 20.686532 192.168.1.101 192.168.1.177 TCP 78 51222 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 WS=64 TSval=1108198121 TSecr=0 SACK_PERM=1
3549 20.688096 192.168.1.177 192.168.1.101 TCP 60 80 > 51221 [ACK] Seq=277 Ack=107 Win=1943 Len=0
3550 20.688570 192.168.1.177 192.168.1.101 TCP 62 80 > 51222 [SYN, ACK] Seq=0 Ack=1 Win=2048 Len=0 MSS=1460 WS=1
3551 20.688693 192.168.1.101 192.168.1.177 TCP 54 51222 > 80 [ACK] Seq=1 Ack=1 Win=262144 Len=0
3552 20.688762 192.168.1.101 192.168.1.177 HTTP 159 GET / HTTP/1.0
3553 20.690482 192.168.1.177 192.168.1.101 TCP 60 80 > 51222 [ACK] Seq=1 Ack=106 Win=1943 Len=0
3554 20.690487 192.168.1.177 192.168.1.101 TCP 69 [TCP segment of a reassembled PDU]
3555 20.690628 192.168.1.101 192.168.1.177 TCP 54 51222 > 80 [ACK] Seq=106 Ack=16 Win=262080 Len=0
3556 20.691693 192.168.1.177 192.168.1.101 TCP 314 [TCP segment of a reassembled PDU]
3557 20.691698 192.168.1.177 192.168.1.101 HTTP 60 HTTP/1.1 200 OK (text/html)
3558 20.691824 192.168.1.101 192.168.1.177 TCP 54 51222 > 80 [ACK] Seq=106 Ack=276 Win=261824 Len=0
3559 20.691825 192.168.1.101 192.168.1.177 TCP 54 51222 > 80 [ACK] Seq=106 Ack=277 Win=261824 Len=0
3560 20.691894 192.168.1.101 192.168.1.177 TCP 54 51222 > 80 [FIN, ACK] Seq=106 Ack=277 Win=262144 Len=0
3561 20.691988 192.168.1.101 192.168.1.177 TCP 78 51223 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 WS=64 TSval=1108198125 TSecr=0 SACK_PERM=1
3562 20.693481 192.168.1.177 192.168.1.101 TCP 60 80 > 51222 [ACK] Seq=277 Ack=107 Win=1943 Len=0
3563 20.693741 192.168.1.177 192.168.1.101 TCP 62 80 > 51223 [SYN, ACK] Seq=0 Ack=1 Win=2048 Len=0 MSS=1460 WS=1
3564 20.693820 192.168.1.101 192.168.1.177 TCP 54 51223 > 80 [ACK] Seq=1 Ack=1 Win=262144 Len=0
3565 20.693884 192.168.1.101 192.168.1.177 HTTP 159 GET / HTTP/1.0
3566 20.695470 192.168.1.177 192.168.1.101 TCP 60 80 > 51223 [RST] Seq=1 Win=0 Len=0

Hello again! This problem still persists with the latest versions of the libraries. I noticed that if I make the server to send bigger payloads the problem does not occur. Also if I use connection keep-alive header and do not close the socket there is no problems. So it seems that the problem is related to opening new connectiona repeatedly. Do you need some more information? For example if you want I can try to reproduce the problem with a programming language of your choice.

Yes it does, I haven't specifically pushed any updates for this, but I have worked on it a bit. While I've been able to stop it from locking up completely, I haven't been able to completely patch this. The problem is stemming from some client sockets showing they have no data thus the server won't finish the connection to it. When this happens the socket would never close and after enough times it would run out of sockets and effectively lock up with the connection reset message.

I've been able to stop it from completely locking up by giving it a timeout before closing the socket that didn't receive anything but this is not a solution as it drops the packet. I've been able to determine that FNET has the data locked up somewhere that never makes it to the socket. The part that I haven't figured out yet is why that's happening and where it's happening. When closing the socket I can see that the data does disappear with it so somehow the data is there, but the socket isn't reporting it as available to the server.

The timeout part I took from the FNET HTTP service just to see if it would fix the issue here, but as mentioned already I wouldn't call it a solution. So as far as I can tell this is an issue somewhere with FNET that I need to find and fix there, once I get more time hopefully I can completely fix this.

Thanks for the reply.

See #9 for the possible fix for this.