mirage/mirage-tcpip

`mirage-tcpip` is blocked when we want to make more than 65535 connections

dinosaure opened this issue ยท 8 comments

This issue is much more easier I think that #456 but on my stress test, it's about:

let getid t dst dst_port =
(* TODO: make this more robust and recognise when all ports are gone *)
let islistener _t _port =
(* TODO keep a list of active listen ports *)
false in
let idinuse t id =
Hashtbl.mem t.channels id ||
Hashtbl.mem t.connects id ||
Hashtbl.mem t.listens id
in
let inuse t id = islistener t (WIRE.src_port id) || idinuse t id in
let rec bumpport t =
(match t.localport with
| 65535 -> t.localport <- 10000
| _ -> t.localport <- t.localport + 1);
let id =
WIRE.v ~src:(Ip.src t.ip ~dst) ~src_port:t.localport ~dst ~dst_port
in
if inuse t id then bumpport t else id
in
bumpport t

At one stage, the server just block any concurrent processes. I will try to find a convenient way to fix this issue. TODO tells me that such case was never reached before.

So it seems that the Hashtbl connects keeps IDs forever and we don't remove them even if we properly close the connection. At one point so, we infinitely execute bumpport because no ports are free. I will try to propose a patch to solve this issue.

@balrajsingh it seems that with your patch, the "finalizer" of a connection (which Hashtbl.remove the connection from t.connects) is never called. It's why, at one point, mirage-tcpip loops infinitely on bumpport because all ports are used and no connection correctly free its resource.

A possible solution is to return Reset instead of Drop when the connection is reset and we got the reset packet. By this way, instead of to be able to send ~60k emails, I can send 75k emails - but an error appears, my process is killed by the kernel and I don't know why.

This is my current patch when my process is killed then:

diff --git a/src/tcp/segment.ml b/src/tcp/segment.ml
index 0394c5fa..05197f89 100644
--- a/src/tcp/segment.ml
+++ b/src/tcp/segment.ml
@@ -110,12 +110,17 @@ module Rx(Time:Mirage_time.S) = struct
 
   let check_valid_segment q seg =
     if seg.header.rst then
-      if Sequence.compare seg.header.sequence (Window.rx_nxt q.wnd) = 0 then
-        `Reset
-      else if Window.valid q.wnd seg.header.sequence then
-        `ChallengeAck
-      else
-        `Drop
+      begin match State.state q.state with
+        | State.Reset ->
+          `Reset
+        | _ ->
+          if Sequence.compare seg.header.sequence (Window.rx_nxt q.wnd) = 0 then
+            `Reset
+          else if Window.valid q.wnd seg.header.sequence then
+            `ChallengeAck
+          else
+            `Drop
+      end
     else if seg.header.syn then
       `ChallengeAck
     else if Window.valid q.wnd seg.header.sequence then`

Again, I don't have any clue about the state machine of the TCP/IP stack but I hope that the description of the bug is clear ๐Ÿ‘ . I will try to craft a *.pcap file when I send a simple email.

email.zip

This the zipped file of what is going on when we want to send an email (you can open it with wireshark) to introspect TCP/IP packets.

We have lot of TCP Dup ACK where the Sequence number does not mean anything for me ๐Ÿ˜• . I would like to say that in this situation, we have 2 connections. One from a client to *:465 which starts a TLS connection and an other from the server as a client to *:25 - the server tries to retransmit the email to another endpoint.

The TCP Dup ACK comes from the mirage-tcpip implementation (used by the server) because we can see it on both side (as a receiver, then, as the sender). In Wireshark, it seems that the RST is not really expected (these packets are in red). We can see that the server properly send a FIN, ACK packet to the initial client and the client still continue to send a RST packet - finally, the initial client wants to send 3 RST packets (with the same sequence number).

I already notice this behavior which comes from the TCP/IP stack of the Linux kernel and I'm not sure how to properly handle that.

Finally, this PCAP file use my own version of mirage-tcpip with the patch below.

EDIT: last note, I did not notice a memory leak even with the patch with a restricted usage of it. So it sill is a good news ๐Ÿ‘ .

@dinosaure I think returning Reset instead of Drop will cause the same memory leak you saw earlier. Returning Reset will cause a put to the rx_data mvar, but if the connection is already in the Reset state, that would mean the thread that reads that mvar has terminated, and since the mvar will never be read it will thus become a leak.

I think the fix should be to first clean up the hash tables correctly when the connection ends. The second fix should be to fix bumpport to recognise that there are no ports available and return a "port exhaustion error". Since port numbers can only be up to 65535 and the first few thousand are reserved, there is a TCP limit of approx 60k connections that can exist between two fixed IP addresses where the listener port is also fixed. So bumpport should recognise when this happens and fail appropriately. Let me work on this and send you a patch to try.

I think the fix should be to first clean up the hash tables correctly when the connection ends.

Yes, but close does not have an access to the internal Hashtbl. Currently, mirage-tcpip expects a certain packet to assert that the connection is closed and it seems that we never reach this specific execution path.

That means that in anyway, we don't properly finalize the connection (and free resources correctly) - even if, in the point of view of the application, the connection is closed.

@dinosaure can you confirm that entries in the hash tables are not removed up at the end of a connection? log_with_stats should show a count of current flows in the hash tables. I'm very sure I had tested many times that the flows get removed from the hash tables once they have ended. If you could, try this: have a few connections that end normally and then wait for about 2 minutes for all the possible timers to expire. Then run log_with_stats to see if any flows are left there. There should be no entries in any of the tables if the flows have actually properly ended. If there are flows stuck in one of the tables, it will be very useful to know which table. Also can you describe again the exact test setup you have: what is the mirage-tcpip doing and what is the Linux end (if any) running, and which end sent the RST?

The "finalising" process is as follows: the flow when it completes the 3 way handshake and is Established registers a Curried function on_close with State, which includes the id data for the connection. This Curried function gets called after all the "end of flow" timers have expired and then clears the flow from the hash tables. These timers are needed to ensure that any out of order or delayed packets that arrive at the end of the connection get properly dealt with. Once the end of flow timers have expired, then any delayed packets in the network no longer need to be demultiplexed into that flow and should generate an RST response.

When instead of the usual FIN sequence to end the flow, the abrupt RST sequence is used by some application (it looks from your testing that the SMTP server on your test machine ends the connection with RST) then everything should be cleared up from the hash tables immediately. My suspicion is that in the State transitions there is some transition involving RST where the registered on_close function is not called. For the moment, I'm going to look for a missing call to on_close in the state transitions and hopefully suggest a patch for you to test. Again I don't have a good current test setup so asking you to help test.

Also the stack generating DUP Acks is not a major issue for the moment and I wouldn't focus on that. It can happen under many legitimate situations. Here it may not be ideal, but it does not break anything and should not cause any problems.

@dinosaure I don't immediately see a missing call in State, I must have missed it or maybe the problem is elsewhere.
Could you help me with a little more information:

  • Could you check the hash tables a few minutes after everything has ended and gone quiet and see how many flows stay stuck in the tables? Is it that all of the connections stay stuck or just some?
  • Which of the 3 tables are they stuck in?
  • Does it happen when the mirage-tcpip stack is making the connection (client) or when it is listening for connections (server)?

Also let me know some details about your setup: what is the test you are running, which TCPs are involved (mirage, Linux,..) and what is each TCP doing.

Thanks.