`mirage-tcpip` is blocked when we want to make more than 65535 connections
dinosaure opened this issue ยท 8 comments
So it seems that the Hashtbl
connects
keeps IDs forever and we don't remove them even if we properly close the connection. At one point so, we infinitely execute bumpport
because no ports are free. I will try to propose a patch to solve this issue.
@balrajsingh it seems that with your patch, the "finalizer" of a connection (which Hashtbl.remove
the connection from t.connects
) is never called. It's why, at one point, mirage-tcpip
loops infinitely on bumpport
because all ports are used and no connection correctly free its resource.
A possible solution is to return Reset
instead of Drop
when the connection is reset and we got the reset packet. By this way, instead of to be able to send ~60k emails, I can send 75k emails - but an error appears, my process is killed by the kernel and I don't know why.
This is my current patch when my process is killed then:
diff --git a/src/tcp/segment.ml b/src/tcp/segment.ml
index 0394c5fa..05197f89 100644
--- a/src/tcp/segment.ml
+++ b/src/tcp/segment.ml
@@ -110,12 +110,17 @@ module Rx(Time:Mirage_time.S) = struct
let check_valid_segment q seg =
if seg.header.rst then
- if Sequence.compare seg.header.sequence (Window.rx_nxt q.wnd) = 0 then
- `Reset
- else if Window.valid q.wnd seg.header.sequence then
- `ChallengeAck
- else
- `Drop
+ begin match State.state q.state with
+ | State.Reset ->
+ `Reset
+ | _ ->
+ if Sequence.compare seg.header.sequence (Window.rx_nxt q.wnd) = 0 then
+ `Reset
+ else if Window.valid q.wnd seg.header.sequence then
+ `ChallengeAck
+ else
+ `Drop
+ end
else if seg.header.syn then
`ChallengeAck
else if Window.valid q.wnd seg.header.sequence then`
Again, I don't have any clue about the state machine of the TCP/IP stack but I hope that the description of the bug is clear ๐ . I will try to craft a *.pcap
file when I send a simple email.
This the zipped file of what is going on when we want to send an email (you can open it with wireshark) to introspect TCP/IP packets.
We have lot of TCP Dup ACK
where the Sequence number does not mean anything for me ๐ . I would like to say that in this situation, we have 2 connections. One from a client to *:465
which starts a TLS connection and an other from the server as a client to *:25
- the server tries to retransmit the email to another endpoint.
The TCP Dup ACK
comes from the mirage-tcpip
implementation (used by the server) because we can see it on both side (as a receiver, then, as the sender). In Wireshark, it seems that the RST
is not really expected (these packets are in red). We can see that the server properly send a FIN, ACK
packet to the initial client and the client still continue to send a RST packet - finally, the initial client wants to send 3 RST packets (with the same sequence number).
I already notice this behavior which comes from the TCP/IP stack of the Linux kernel and I'm not sure how to properly handle that.
Finally, this PCAP file use my own version of mirage-tcpip
with the patch below.
EDIT: last note, I did not notice a memory leak even with the patch with a restricted usage of it. So it sill is a good news ๐ .
@dinosaure I think returning Reset
instead of Drop
will cause the same memory leak you saw earlier. Returning Reset
will cause a put to the rx_data
mvar, but if the connection is already in the Reset
state, that would mean the thread that reads that mvar has terminated, and since the mvar will never be read it will thus become a leak.
I think the fix should be to first clean up the hash tables correctly when the connection ends. The second fix should be to fix bumpport
to recognise that there are no ports available and return a "port exhaustion error". Since port numbers can only be up to 65535 and the first few thousand are reserved, there is a TCP limit of approx 60k connections that can exist between two fixed IP addresses where the listener port is also fixed. So bumpport
should recognise when this happens and fail appropriately. Let me work on this and send you a patch to try.
I think the fix should be to first clean up the hash tables correctly when the connection ends.
Yes, but close
does not have an access to the internal Hashtbl
. Currently, mirage-tcpip
expects a certain packet to assert that the connection is closed and it seems that we never reach this specific execution path.
That means that in anyway, we don't properly finalize the connection (and free resources correctly) - even if, in the point of view of the application, the connection is closed.
@dinosaure can you confirm that entries in the hash tables are not removed up at the end of a connection? log_with_stats
should show a count of current flows in the hash tables. I'm very sure I had tested many times that the flows get removed from the hash tables once they have ended. If you could, try this: have a few connections that end normally and then wait for about 2 minutes for all the possible timers to expire. Then run log_with_stats
to see if any flows are left there. There should be no entries in any of the tables if the flows have actually properly ended. If there are flows stuck in one of the tables, it will be very useful to know which table. Also can you describe again the exact test setup you have: what is the mirage-tcpip doing and what is the Linux end (if any) running, and which end sent the RST
?
The "finalising" process is as follows: the flow when it completes the 3 way handshake and is Established
registers a Curried function on_close
with State
, which includes the id
data for the connection. This Curried function gets called after all the "end of flow" timers have expired and then clears the flow from the hash tables. These timers are needed to ensure that any out of order or delayed packets that arrive at the end of the connection get properly dealt with. Once the end of flow timers have expired, then any delayed packets in the network no longer need to be demultiplexed into that flow and should generate an RST
response.
When instead of the usual FIN
sequence to end the flow, the abrupt RST
sequence is used by some application (it looks from your testing that the SMTP server on your test machine ends the connection with RST
) then everything should be cleared up from the hash tables immediately. My suspicion is that in the State
transitions there is some transition involving RST
where the registered on_close
function is not called. For the moment, I'm going to look for a missing call to on_close
in the state transitions and hopefully suggest a patch for you to test. Again I don't have a good current test setup so asking you to help test.
Also the stack generating DUP Acks
is not a major issue for the moment and I wouldn't focus on that. It can happen under many legitimate situations. Here it may not be ideal, but it does not break anything and should not cause any problems.
@dinosaure I don't immediately see a missing call in State
, I must have missed it or maybe the problem is elsewhere.
Could you help me with a little more information:
- Could you check the hash tables a few minutes after everything has ended and gone quiet and see how many flows stay stuck in the tables? Is it that all of the connections stay stuck or just some?
- Which of the 3 tables are they stuck in?
- Does it happen when the mirage-tcpip stack is making the connection (client) or when it is listening for connections (server)?
Also let me know some details about your setup: what is the test you are running, which TCPs are involved (mirage, Linux,..) and what is each TCP doing.
Thanks.