local ports exhaust quickly due to TCP `TIME_WAIT` when `reconnect_interval` is small
minhuw opened this issue · 1 comments
I found that when reconnect-interval
is small, local ports exhaust quickly before the experiment completes as the log below shows.
$ memtier_benchmark -s 192.168.1.2 -t 1 -p 7777 -c 128 -n 10000 --json-out-file experiment.json --reconnect-interval 1
Json file experiment.json created...
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 1%, 0 secs] 1 threads: 14335 ops, 14340 (avg: 14340) ops/sec, 611.74KB/sec (avg: 611.74KB/sec), 5.19 (avg: 5.19) msec latency
<some logs omitted>
[RUN #1 2%, 20 secs] 1 threads: 27477 ops, 692 (avg: 1373) ops/sec, 28.74KB/sec (avg: 58.36KB/sec), 47.35 (avg: 28.27) msec latency
connect failed, error = Cannot assign requested address
memtier_benchmark: shard_connection.cpp:470: void shard_connection::process_response(): Assertion `ret == 0' failed.
I find that SO_LINGER
is not enabled so closed TCP connections go to the TIMEWAIT
state instead of releasing local ports immediately.
memtier_benchmark/shard_connection.cpp
Lines 229 to 235 in 4203084
It works if I enable SO_LINGER
as follows thus aborting the connection immediately when it is closed.
- struct linger ling = {0, 0};
+ struct linger ling = {1, 0};
Is there any reason SO_LINGER
is not enabled? Any workaround so I could test the scenario when reconnect_interval
is very small?
@minhuw I believe tunning tcp_fin_timeout + tcp_tw_reuse / tcp_tw_recycle will help you WRT reusing TW connections and also reduce the TIMEWAIT connections in total.
However, it's essential to carefully test and evaluate the impact of enabling these parameters in your specific environment, as their behavior can vary depending on the network configuration and application requirements.