vi/websocat

websocat sometimes doesn't flush pending data, hangs

the-sun-will-rise-tomorrow opened this issue · 6 comments

I'm running:

$ websocat \
        --exit-on-eof \
        --binary \
        tcp-l:127.0.0.1:1234 \
        wss://1.2.3.4:1234/url

When I send a large packet to 127.0.0.1:1234, sometimes websocat doesn't send it to the destination immediately (and just sits there until something causes it to send the pending data).

If the destination sends a PING, websocat does flush pending data.

If I add --ping-interval to websocat's command line, that also causes websocat to send pending data when it sends PINGs.

I can reproduce it with v1.12.0 and current master (34ddb4d).

I can only reproduce it when the target is a remote host and not localhost (probably because the buffer needs to actually pile up locally for the problem to happen).

vi commented

What OS do you use? Is it reproducible on Linux?

Is wss:// (i.e. TLS) necessary for the hang? Or does it also hang with plain ws://?

when the target is a remote host and not localhost

It is the only difference, not e.g. ws:// on localhost vs wss:// on remote host?

What OS do you use? Is it reproducible on Linux?

Yes, it happens on Linux 6.6.7 for me.

It is the only difference, not e.g. ws:// on localhost vs wss:// on remote host?

Yes; if I point websocat at a locally running TCP proxy (like socat) which always accepts & buffers input then redirects to the remote host, the problem doesn't occur.

Is wss:// (i.e. TLS) necessary for the hang? Or does it also hang with plain ws://?

I can test this but it will take some time.

vi commented

If you are open for testing, you can try early Websocat4 build - does it flush properly?

websocat4 --binary tcp-l:127.0.0.1:1234 wss://1.2.3.4:1234/url

--exit-on-eof is not yet supported though.

websocat4early.zip - linux x86_64 executable, you can also built yourself from websocat4 branch.

Or does it also hang with plain ws://?

It hangs with plain ws:// as well.

If you are open for testing, you can try early Websocat4 build - does it flush properly?

I am, but, sorry, I can't use that binary.

e6a57f3 does not hang.

vi commented

Is a significantly older pre-built Websocat version (e.g. https://github.com/vi/websocat/releases/tag/v1.8.0) also buggy?

Is the problem also reproducible locally if one uses network namespaces, veth and netem to emulate a non-perfect network?

What does "send a large packet to 127.0.0.1:1234" mean from a user perspective? Is something like cat /dev/zero | nc 127.0.0.1 1234 that or one needs something more specific?
For me running websocat --binary tcp-l:127.0.0.1:1234 wss://ws.vi-server.org/mirror and testing performance with cat /dev/zero | nc 127.0.0.1 1234 | pv > /dev/null does not show hangs.

Fixing it for Websocat1 may be nontrivial (especially without a good repro), and Websocat1 may be nearing sunset.
Is your use case already covered by a workaround, so that proper fix can wait, i.e. consist of abandoning legacy version and finishing and releasing (an alpha version of) Websocat4?

Is a significantly older pre-built Websocat version (e.g. https://github.com/vi/websocat/releases/tag/v1.8.0) also buggy?

Yes. I tested 1.8.0 and 1.3.0 and they hang.

Is the problem also reproducible locally if one uses network namespaces, veth and netem to emulate a non-perfect network?

I would need to set that up 👀

What does "send a large packet to 127.0.0.1:1234" mean from a user perspective? Is something like cat /dev/zero | nc 127.0.0.1 1234 that or one needs something more specific?

On the other side of the WebSocket is an SQL server. The hang happens when, after a handshake and authentication, I send a large query (120 KiB). I have not tried piping /dev/zero.

For me running websocat --binary tcp-l:127.0.0.1:1234 wss://ws.vi-server.org/mirror and testing performance with cat /dev/zero | nc 127.0.0.1 1234 | pv > /dev/null does not show hangs.

I think the difference is that there isn't a finite amount of data on input (there's always more data to push out any stuck previous data). A better chance to reproduce this would be to connect two echo servers, and then send an initial large packet; it should bounce infinitely.

Is your use case already covered by a workaround, so that proper fix can wait, i.e. consist of abandoning legacy version and finishing and releasing (an alpha version of) Websocat4?

websocat4 and enabling --ping-interval both seem to work...