Set TCP_NODELAY and TCP_QUICKACK
cjrh opened this issue · 6 comments
See this comment.
That still irks me. The real problem is not tinygram prevention. It's ACK delays, and that stupid fixed timer. They both went into TCP around the same time, but independently. I did tinygram prevention (the Nagle algorithm) and Berkeley did delayed ACKs, both in the early 1980s. The combination of the two is awful. Unfortunately by the time I found about delayed ACKs, I had changed jobs, was out of networking, and doing a product for Autodesk on non-networked PCs.
Delayed ACKs are a win only in certain circumstances - mostly character echo for Telnet. (When Berkeley installed delayed ACKs, they were doing a lot of Telnet from terminal concentrators in student terminal rooms to host VAX machines doing the work. For that particular situation, it made sense.) The delayed ACK timer is scaled to expected human response time. A delayed ACK is a bet that the other end will reply to what you just sent almost immediately. Except for some RPC protocols, this is unlikely. So the ACK delay mechanism loses the bet, over and over, delaying the ACK, waiting for a packet on which the ACK can be piggybacked, not getting it, and then sending the ACK, delayed. There's nothing in TCP to automatically turn this off. However, Linux (and I think Windows) now have a TCP_QUICKACK socket option. Turn that on unless you have a very unusual application.
Turning on TCP_NODELAY has similar effects, but can make throughput worse for small writes. If you write a loop which sends just a few bytes (worst case, one byte) to a socket with "write()", and the Nagle algorithm is disabled with TCP_NODELAY, each write becomes one IP packet. This increases traffic by a factor of 40, with IP and TCP headers for each payload. Tinygram prevention won't let you send a second packet if you have one in flight, unless you have enough data to fill the maximum sized packet. It accumulates bytes for one round trip time, then sends everything in the queue. That's almost always what you want. If you have TCP_NODELAY set, you need to be much more aware of buffering and flushing issues.
None of this matters for bulk one-way transfers, which is most HTTP today. (I've never looked at the impact of this on the SSL handshake, where it might matter.)
Short version: set TCP_QUICKACK. If you find a case where that makes things worse, let me know.
John Nagle
More commentary here
We should do both.
The thing that John Nagle doesn't adequately explain is that TCP_NODELAY is a send-side option, and TCP_QUICKACK is a receive-side option. That means the thing that TCP_NODELAY fixes for us is not fixed by TCP_QUICKACK: we'd need the server to set TCP_QUICKACK for us to safely turn off TCP_NODELAY.
In the case of urllib3, TCP_NODELAY is not a particular risk: we don't have a socket write profile that sends lots of tinygrams (one of the few things httplib got right). Amusingly, requests has an edge case where it gets this wrong (uploading chunk-encoded bodies), but that's fixable using TCP_CORK (and I may well do that at some point).
We should set TCP_QUICKACK if we can, because it's clearly a net social good. But we should keep TCP_NODELAY as well. =)
TCP_QUICKACK (since Linux 2.4.4)
Enable quickack mode if set or disable quickack mode if cleared. In quickack mode, acks are sent immediately, rather than delayed if needed in accordance to normal TCP operation. This flag is not permanent, it only enables a switch to or from quickack mode. Subsequent operation of the TCP protocol will once again enter/leave quickack mode depending on internal protocol processing and factors such as delayed ack timeouts occurring and data transfer. This option should not be used in code intended to be portable.
For reference , these are the tcp options available on the socket
module on Windows:
>>> print('\n'.join(x for x in dir(socket) if x.startswith('TCP_')))
TCP_FASTOPEN
TCP_KEEPCNT
TCP_KEEPIDLE
TCP_KEEPINTVL
TCP_MAXSEG
TCP_NODELAY
and here on Linux:
>>> print('\n'.join(x for x in dir(socket) if x.startswith('TCP_')))
TCP_CONGESTION
TCP_CORK
TCP_DEFER_ACCEPT
TCP_FASTOPEN
TCP_INFO
TCP_KEEPCNT
TCP_KEEPIDLE
TCP_KEEPINTVL
TCP_LINGER2
TCP_MAXSEG
TCP_NODELAY
TCP_NOTSENT_LOWAT
TCP_QUICKACK
TCP_SYNCNT
TCP_USER_TIMEOUT
TCP_WINDOW_CLAMP
Upstream Python already sets TCP_NODELAY
: https://asyncio.readthedocs.io/en/latest/performance.html#tcp-nodelay
Closing this issue. Here's why:
TCP_NODELAY
is already being set in Python by default (see previous comment)TCP_QUICKACK
is not yet cross-platform (it doesn't appear in Windows headers), and secondly, it needs to be re-applied withsetsockopt
after everyrecv()
; however, aiomsg doesn't userecv()
directly, we usereadexactly()
:
# msgproto.py
async def read_msg(reader: StreamReader) -> bytes:
""" Returns b'' if the connection is lost."""
try:
size_bytes = await reader.readexactly(_PREFIX_SIZE)
size = int.from_bytes(size_bytes, byteorder="big")
data = await reader.readexactly(size)
logger.debug(f'Got data from socket: "{data[:64]}"')
return data
except (EOFError, OSError) as e:
logger.info(f"Connection lost: {e}")
return b""
I am not going to reimplement readexactly()
just so that I can reapply setsockopt(blah, TCP_QUICKACK, blah)
after every recv()
, especially since it doesn't even appear to be cross-platform.
Thus, closing.