Connection timeout and reset problems with the WebServer example
lasselukkari opened this issue · 5 comments
First of all thanks for your hard work!
When I run the basic WebServer example I start to get connection timeouts and resets if I push server even a bit.
I'm using Apache Benchmark to generate the load: ab -k -c 1 -n 20 http://192.168.1.177/
If I keep the -n
small things work as expected. With values bigger than 100 I will most of the time get either a connection reset from the server with apr_socket_recv: Connection reset by peer (54)
or the client times out after a while apr_pollset_poll: The timeout specified has expired (70007)
. When the Apache Benchmark is waiting for the response that never comes but has not timed out yet the website loads just fine I create a new connection for example using a browser.
I have tested pretty much the same code with Arduino Uno and the ESP8266 and ESP32 chips and I see no problems. This is why expect the problem not to be in my test setup. I also just updated the lib to the latest master. I also have the latest version of your FNET fork. My Teensyduino version is 1.5.3. I can also reproduce the same behaviour with other load testing tools. I have not tried another computer, network or ethernet chip or Teensy.
Adding client.close()
before client.stop()
did not improve the situation.
Have a nice weekend!
Here is a wireshark capture including the the last few successful requests before the reset:
No. | Time | Source | Destination | Protocol | Length | Info |
---|---|---|---|---|---|---|
3537 | 20.282542 | 192.168.1.101 | 192.168.1.177 | TCP | 78 | 51221 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 WS=64 TSval=1108197721 TSecr=0 SACK_PERM=1 |
3538 | 20.284055 | 192.168.1.177 | 192.168.1.101 | TCP | 60 | 80 > 51220 [ACK] Seq=277 Ack=107 Win=1943 Len=0 |
3539 | 20.284320 | 192.168.1.177 | 192.168.1.101 | TCP | 62 | 80 > 51221 [SYN, ACK] Seq=0 Ack=1 Win=2048 Len=0 MSS=1460 WS=1 |
3540 | 20.284399 | 192.168.1.101 | 192.168.1.177 | TCP | 54 | 51221 > 80 [ACK] Seq=1 Ack=1 Win=262144 Len=0 |
3541 | 20.284462 | 192.168.1.101 | 192.168.1.177 | HTTP | 159 | GET / HTTP/1.0 |
3542 | 20.286025 | 192.168.1.177 | 192.168.1.101 | TCP | 60 | 80 > 51221 [ACK] Seq=1 Ack=106 Win=1943 Len=0 |
3543 | 20.684843 | 192.168.1.177 | 192.168.1.101 | TCP | 69 | [TCP segment of a reassembled PDU] |
3544 | 20.684931 | 192.168.1.101 | 192.168.1.177 | TCP | 54 | 51221 > 80 [ACK] Seq=106 Ack=16 Win=262080 Len=0 |
3545 | 20.686272 | 192.168.1.177 | 192.168.1.101 | HTTP | 314 | HTTP/1.1 200 OK (text/html) |
3546 | 20.686367 | 192.168.1.101 | 192.168.1.177 | TCP | 54 | 51221 > 80 [ACK] Seq=106 Ack=277 Win=261824 Len=0 |
3547 | 20.686431 | 192.168.1.101 | 192.168.1.177 | TCP | 54 | 51221 > 80 [FIN, ACK] Seq=106 Ack=277 Win=262144 Len=0 |
3548 | 20.686532 | 192.168.1.101 | 192.168.1.177 | TCP | 78 | 51222 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 WS=64 TSval=1108198121 TSecr=0 SACK_PERM=1 |
3549 | 20.688096 | 192.168.1.177 | 192.168.1.101 | TCP | 60 | 80 > 51221 [ACK] Seq=277 Ack=107 Win=1943 Len=0 |
3550 | 20.688570 | 192.168.1.177 | 192.168.1.101 | TCP | 62 | 80 > 51222 [SYN, ACK] Seq=0 Ack=1 Win=2048 Len=0 MSS=1460 WS=1 |
3551 | 20.688693 | 192.168.1.101 | 192.168.1.177 | TCP | 54 | 51222 > 80 [ACK] Seq=1 Ack=1 Win=262144 Len=0 |
3552 | 20.688762 | 192.168.1.101 | 192.168.1.177 | HTTP | 159 | GET / HTTP/1.0 |
3553 | 20.690482 | 192.168.1.177 | 192.168.1.101 | TCP | 60 | 80 > 51222 [ACK] Seq=1 Ack=106 Win=1943 Len=0 |
3554 | 20.690487 | 192.168.1.177 | 192.168.1.101 | TCP | 69 | [TCP segment of a reassembled PDU] |
3555 | 20.690628 | 192.168.1.101 | 192.168.1.177 | TCP | 54 | 51222 > 80 [ACK] Seq=106 Ack=16 Win=262080 Len=0 |
3556 | 20.691693 | 192.168.1.177 | 192.168.1.101 | TCP | 314 | [TCP segment of a reassembled PDU] |
3557 | 20.691698 | 192.168.1.177 | 192.168.1.101 | HTTP | 60 | HTTP/1.1 200 OK (text/html) |
3558 | 20.691824 | 192.168.1.101 | 192.168.1.177 | TCP | 54 | 51222 > 80 [ACK] Seq=106 Ack=276 Win=261824 Len=0 |
3559 | 20.691825 | 192.168.1.101 | 192.168.1.177 | TCP | 54 | 51222 > 80 [ACK] Seq=106 Ack=277 Win=261824 Len=0 |
3560 | 20.691894 | 192.168.1.101 | 192.168.1.177 | TCP | 54 | 51222 > 80 [FIN, ACK] Seq=106 Ack=277 Win=262144 Len=0 |
3561 | 20.691988 | 192.168.1.101 | 192.168.1.177 | TCP | 78 | 51223 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 WS=64 TSval=1108198125 TSecr=0 SACK_PERM=1 |
3562 | 20.693481 | 192.168.1.177 | 192.168.1.101 | TCP | 60 | 80 > 51222 [ACK] Seq=277 Ack=107 Win=1943 Len=0 |
3563 | 20.693741 | 192.168.1.177 | 192.168.1.101 | TCP | 62 | 80 > 51223 [SYN, ACK] Seq=0 Ack=1 Win=2048 Len=0 MSS=1460 WS=1 |
3564 | 20.693820 | 192.168.1.101 | 192.168.1.177 | TCP | 54 | 51223 > 80 [ACK] Seq=1 Ack=1 Win=262144 Len=0 |
3565 | 20.693884 | 192.168.1.101 | 192.168.1.177 | HTTP | 159 | GET / HTTP/1.0 |
3566 | 20.695470 | 192.168.1.177 | 192.168.1.101 | TCP | 60 | 80 > 51223 [RST] Seq=1 Win=0 Len=0 |
Hello again! This problem still persists with the latest versions of the libraries. I noticed that if I make the server to send bigger payloads the problem does not occur. Also if I use connection keep-alive header and do not close the socket there is no problems. So it seems that the problem is related to opening new connectiona repeatedly. Do you need some more information? For example if you want I can try to reproduce the problem with a programming language of your choice.
Yes it does, I haven't specifically pushed any updates for this, but I have worked on it a bit. While I've been able to stop it from locking up completely, I haven't been able to completely patch this. The problem is stemming from some client sockets showing they have no data thus the server won't finish the connection to it. When this happens the socket would never close and after enough times it would run out of sockets and effectively lock up with the connection reset message.
I've been able to stop it from completely locking up by giving it a timeout before closing the socket that didn't receive anything but this is not a solution as it drops the packet. I've been able to determine that FNET has the data locked up somewhere that never makes it to the socket. The part that I haven't figured out yet is why that's happening and where it's happening. When closing the socket I can see that the data does disappear with it so somehow the data is there, but the socket isn't reporting it as available to the server.
The timeout part I took from the FNET HTTP service just to see if it would fix the issue here, but as mentioned already I wouldn't call it a solution. So as far as I can tell this is an issue somewhere with FNET that I need to find and fix there, once I get more time hopefully I can completely fix this.
Thanks for the reply.