microsoft/WSL

Incomplete socket send with fast writer and slow reader

aseering opened this issue ยท 30 comments

This might be related to #493 ; I'm not sure. It certainly feels very similar. Reproduced under build 14376.

Open up two WSL terminals. In the first, run the following two commands:

dd if=/dev/zero of=big.dat bs=3000000 count=1
python -m SimpleHTTPServer

Then, in the second, run the following:

curl http://localhost:8000/big.dat

Expected:

  • curl will successfully transfer the file

Actual:

  • curl emits the following error:
curl: (18) transfer closed with 509705 bytes remaining to read

The exact number of bytes remaining varies, though it's consistently in the vicinity of 500kb for me.

This issue does not reproduce for me if I run curl.exe in Windows.

It also doesn't reproduce for me if I run python -m SimpleHTTPServer in Windows, and curl in WSL.

It also doesn't reproduce for me if I redirect curl's stdout; for example, curl http://localhost:8000/big.dat | wc -c.

strace on the Python server:

...
read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384) = 16384
sendto(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192, 0, NULL, 0) = 8192
sendto(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192, 0, NULL, 0) = 8192
read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384) = 16384
sendto(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192, 0, NULL, 0) = 8192
sendto(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192, 0, NULL, 0) = 8192
read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384) = 1728
read(5, "", 14336)                      = 0
sendto(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1728, 0, NULL, 0) = 1728
read(5, "", 16384)                      = 0
close(5)                                = 0
munmap(0x7fd0e7c30000, 4096)            = 0
shutdown(4, SHUT_WR)                    = 0
close(4)                                = 0
select(4, [3], [], [], {0, 500000})     = 0 (Timeout)
select(4, [3], [], [], {0, 500000})     = 0 (Timeout)
select(4, [3], [], [], {0, 500000})     = 0 (Timeout)
select(4, [3], [], [], {0, 500000})     = 0 (Timeout)
...

strace on the curl client:

clock_gettime(CLOCK_MONOTONIC, {94087, 668388000}) = 0
poll([{fd=3, events=POLLIN}], 1, 1000)  = 1 ([{fd=3, revents=POLLIN}])
clock_gettime(CLOCK_MONOTONIC, {94087, 668724000}) = 0
poll([{fd=3, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 0, NULL, NULL) = 16384
write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096                                    ) = 4096
write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 12288                                   ) = 12288
clock_gettime(CLOCK_MONOTONIC, {94087, 699670000}) = 0
clock_gettime(CLOCK_MONOTONIC, {94087, 700584000}) = 0
clock_gettime(CLOCK_MONOTONIC, {94087, 701067000}) = 0
clock_gettime(CLOCK_MONOTONIC, {94087, 701318000}) = 0
clock_gettime(CLOCK_MONOTONIC, {94087, 701434000}) = 0
poll([{fd=3, events=POLLIN}], 1, 1000)  = 1 ([{fd=3, revents=POLLIN}])
clock_gettime(CLOCK_MONOTONIC, {94087, 701848000}) = 0
poll([{fd=3, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 0, NULL, NULL) = 16384
write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096                                    ) = 4096
write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 12288                                   ) = 12288
clock_gettime(CLOCK_MONOTONIC, {94087, 732999000}) = 0
clock_gettime(CLOCK_MONOTONIC, {94087, 733974000}) = 0
clock_gettime(CLOCK_MONOTONIC, {94087, 734506000}) = 0
clock_gettime(CLOCK_MONOTONIC, {94087, 734785000}) = 0
clock_gettime(CLOCK_MONOTONIC, {94087, 734913000}) = 0
poll([{fd=3, events=POLLIN}], 1, 1000)  = 1 ([{fd=3, revents=POLLIN}])
clock_gettime(CLOCK_MONOTONIC, {94087, 735264000}) = 0
poll([{fd=3, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "", 16384, 0, NULL, NULL)   = 0
clock_gettime(CLOCK_MONOTONIC, {94087, 735804000}) = 0
clock_gettime(CLOCK_MONOTONIC, {94087, 735991000}) = 0
clock_gettime(CLOCK_MONOTONIC, {94087, 736374000}) = 0
time([1467350132])                      = 1467350132
close(3)                                = 0
rt_sigaction(SIGPIPE, {SIG_IGN, [PIPE], SA_RESTORER|SA_RESTART, 0x7fc270560330}, NULL, 8) = 0
write(2, "c", 1c)                        = 1
write(2, "u", 1u)                        = 1
write(2, "r", 1r)                        = 1
write(2, "l", 1l)                        = 1
write(2, ":", 1:)                        = 1

I'm mildly perturbed that curl is attempting to push its error message through the kernel one byte at a time. But that's presumably after the actual failure has occurred.

@aseering - Thanks for reporting the issue and the repro steps. I was able to reproduce this issue and I have an understanding of the bug.

I'm not sure if the issue I am experiencing is related but I think it is.

I get constant "net::ERR_CONTENT_LENGTH_MISMATCH" errors from Chrome when I try to visit my uWSGI HTTP server. When using Flask as webserver, I get frequent "net::ERR_CONNECTION_RESET" (in the Flask stacktrace I get some Broken Pipe errors, I will attach them soon. I don't always get these Broken Pipe stack traces). This same app works fine on a virtual machine for both uWSGI and Flask.

If you need any more information, please do ask.

Edit: Client Chrome is not running from WSL.

Edit 2: I'm being silly. Let me include some information.
Windows build: Version 1607 (OS Build 14946)
Ubuntu version: 16.04 Xenial (apt-get dist-upgrade from 14.04)
uWSGI version: 2.0.3
uWSGI cmdline args: "--http 127.0.0.1:5000"
Flask version: 0.10.1

Edit 3: (Running the Flask webserver here) When trying to access my app from Firefox running on WSL itself, the file just doesn't fully transfer over (JS in my case), causing the script to fail. So, it doesn't matter whether the client is Windows or WSL..

When trying to fetch the file with curl (CURL v7.47.0 on WSL) I get

(18) transfer closed with 524925 bytes remaining to read
xbytez@i5-2500K:~$

When trying to fetch the same on Windows (CURL v7.49.1 on Windows) I get

curl: (56) Recv failure: Connection was reset
C:\Users\xBytez>

Edit 4:
The promised stacktrace

Exception happened during processing of request from ('127.0.0.1', 19431)
Traceback (most recent call last):
  File "/usr/lib/python2.7/SocketServer.py", line 290, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/lib/python2.7/SocketServer.py", line 318, in process_request
    self.finish_request(request, client_address)
  File "/usr/lib/python2.7/SocketServer.py", line 331, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/lib/python2.7/SocketServer.py", line 654, in __init__
    self.finish()
  File "/usr/lib/python2.7/SocketServer.py", line 713, in finish
    self.wfile.close()
  File "/usr/lib/python2.7/socket.py", line 283, in close
    self.flush()
  File "/usr/lib/python2.7/socket.py", line 307, in flush
    self._sock.sendall(view[write_offset:write_offset+buffer_size])
error: [Errno 32] Broken pipe

@sunilmut -- you mentioned a little while back that you have a repro for this; just wondering where it is on your priority list?

There's this ticket and there's #616; if you keep tracing there are about a dozen tickets that refer to those two; etc. It would be great to hear something from y'all about what your plans are for this collection of issues.

FYI -- this still reproduces on build 14955 and Ubuntu 16.04.

Interestingly, during this download, CPU usage on the machine goes up a lot. But that usage doesn't show up in top running in WSL; it's not assigned to either Python or cURL. If I open up Task Manager, I see Console Window Host pegging a CPU core. Neither application is writing anything to stdout when this increased CPU utilization occurs.

@aseering - Thanks for reviving this thread again. I was out for couple of weeks and so the delay in the response. I will take this up immediately, since I agree that this could be manifesting in many different forms. I will post back once I have something.

Thanks you so much for the update. Once this is fixed I can actually move my dev environment fully to WSL! Thanks to you too @aseering for your continous support for WSL, you're everywhere ;).

@aseering - So, I have root caused the problem. The way the repro is setup, the server (python) side after sending the last byte, calls shutdown and closes the socket. Watching the data transfer under the debugger, it appears that whenever the socket is being closed by python, invariably, there is pending data to be sent. When there is pending data in the socket (with linger disabled), WSL closes the socket in the background. The way the background logic works is that it will wait for the last byte to be sent or a timeout, whichever comes first. After that it will force close the socket. In this case, we are not hitting the timeout and WSL force closes the socket after sending the last byte. Force closing the socket causes an abortive disconnect, which disregards any data that has not been received so far by the peer (i.e curl in this case).
Now, when I look at the socket man page, the line that grabs my attention is When the socket is closed as part of exit(2), it always lingers in the background. And, it makes me wonder, "how long does the socket linger in the background (maybe by trying to do a graceful disconnect) after sending the last byte, before forcefully closing the connection".

@sunilmut -- hm, good question... As a programmer, my instinctive expectation is that there would be no simple timeout. I can't immediately find anything online that contradicts that expectation.

I would expect applications that care about such things to manage them explicitly using SO_LINGER in order to have more-granular control over the socket's exact behavior. In that case, the kernel is given an explicit timeout and should follow it.

If the client dies or otherwise goes unresponsive while the server is trying to send data, then the connection will naturally hit TCP's built-in retry-count limit and get torn down. That's true regardless of whether the server is still trying to send, but, given that the kernel buffer is of bounded size, it does guarantee that the socket will close in bounded (though very large in the worst case) time.

Basically, TCP itself guarantees that the send must still be making progress. If it's making progress, why kill it? If the answer is "because I'm worried about running out of kernel memory or ephemeral ports", then your app is dealing with high load and should be using features like SO_LINGER to carefully manage its resource usage. If it's not -- c'mon, that's like two lines of code; just go fix it :-) (Ah, the advantages of a mostly-open-source ecosystem on the design of the underlying platform...)

@aseering

Basically, TCP itself guarantees that the send must still be making progress. If it's making progress, why kill it?

The primary motivation for our design decisions is compatibility.

As a programmer, my instinctive expectation is that there would be no simple timeout.

Do you have anything (link) that supports this? My searches are returning empty handed. Maybe that is what you also mean by I can't immediately find anything online that contradicts that expectation. :)

Yeah, I can't find any documentation regarding a timeout either (unless SO_LINGER is used). I'm basically just trying to justify an educated guess.

@sunilmut This blog post has some related information regarding SO_LINGER behavior. You're probably already all over this, but I thought it worth the heads up anyway. Cross platform linger has been a bane for ages.

Regularly-scheduled ping: This still reproduces for me on build 14971. Though, interestingly, curl hangs for several seconds before reporting that the transfer has closed -- I thought I remembered it erroring out immediately before.

Also, a few more people appear to have hit this issue, and have filed new tickets. (Relevant tickets are linked above. Mentioning them in case anyone is subscribed to just this ticket, as I think github won't then send you a notification e-mail.)

Today, I have been investigating the possibility of switching to Windows from Ubuntu (since I pretty much only use the command line anyway (apart from gvim); unfortunately, I think I'm seeing this too :

^C^C----------------------------------------
Exception happened during processing of request from ('127.0.0.1', 57104)
Traceback (most recent call last):
  File "/usr/lib/python2.7/SocketServer.py", line 295, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/lib/python2.7/SocketServer.py", line 321, in process_request
    self.finish_request(request, client_address)
  File "/usr/lib/python2.7/SocketServer.py", line 334, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/lib/python2.7/SocketServer.py", line 651, in __init__
    self.finish()
  File "/usr/lib/python2.7/SocketServer.py", line 705, in finish
    self.wfile.flush()
  File "/usr/lib/python2.7/socket.py", line 303, in flush
    self._sock.sendall(view[write_offset:write_offset+buffer_size])
KeyboardInterrupt
----------------------------------------

Python 2.7.6. I'm not sure how to find the versions of the Microsoft s/w, but I installed it today.

@davidmaxwaterman -- you can find the version by opening "About your PC" in System Settings. (Cortana finds it directly for me, if I just type in "About your PC".) You specifically want the "OS Build" field.

Unless you're running a Windows Insider build, if you're running WSL, I would expect your OS Build to be 14393. You can get newer builds of Windows (with new features but potentially more bugs) through the Insider program. Unfortunately, the latest Insider build doesn't yet have a fix for this issue either.

Just an update that this issue has been fixed in the dev branch and should soon hit the release branch. With the above repro steps, I was successfully able to transfer up to 3GB of data without any issues (thought it took a long time). Many thanks @aseering for being persistent and helping us stay on track. I will also ping this thread again once the fix makes it to the release branch.

So what was the answer to your pretty astute question: "how long does the socket linger in the background (maybe by trying to do a graceful disconnect) after sending the last byte, before forcefully closing the connection?"

@therealkenc - For getting maximum reliability guarantees from the TCP stack, the TCP socket should not be closed until both the ends have been fully disconnected (or without the full termination handshake complete). To prevent server side sockets in a hanging state (from misbehaving clients that do not close their ends), there is timeout built it some systems. When the user mode socket is closed, a disconnect request (or FIN in TCP terms) is issued. When the FIN is ACK'ed, the TCP socket is in FIN_WAIT_2 state. After waiting for tcp_fin_timeout(http://man7.org/linux/man-pages/man7/tcp.7.html) seconds in the FIN_WAIT_2 state, the socket is forcibly closed. In WSL, we were not completely following the TCP guidelines, thereby leading to some data loss. The Windows networking team helped us figure it out.

Is this fix included in insider build 14986? Just wondering so I know I can upgrade and test it

@xBytez Unfortunately, no. The fix got checked in the dev branch 4-5 days ago. The release branch runs behind the dev branch on an average 2-3 weeks (this is not a hard and fast rule, but just from my experience). Just to give you an understanding of the timeline. I am also eager to get this validated.

@sunilmut Thanks for the information. Was just curious. I also found the WSL Release notes on the MSDN blog so I'll just have a look at that ;). Thanks again!

It seems that 14986 caused this problem occuring more often. I'm using gulp's http server, so this issue is more visible in the browser when loading assets (larger CSS or JS files). Before 14986 i was able to refresh browser few times until it stopped interrupting with assets download, but now it seems i can't do that anymore. Issue is more persistent. Any ideas on new build release date (+/-)?

Any ideas on new build release date (+/-)?

The fix has made it to the release branch. It depends on which build they snap for the Insider release (which is not in our (WSL team's) control). If not in the upcoming Insider release build, it should definitely be in the next one after that. I don't have any more insight into the timelines/dates etc. Thanks for your patience.

This should be fixed in 15002. Please validate and let us know if the issue persists. Thanks for your patience, @ALL and persistence, @aseering.

As for now (an hour of playing with stuff around after upgrade this morning), works perfectly ;-) Thank You!

Same here! I was able to run my Flask server just fine. Thank you!

@krzysztofantczak & @xBytez - Really glad to know that things are working so far. Will keep the issue open for sometime, before closing it out.

This is also working great for me. Thanks!

Glad to know. Thanks @aseering