mirage/mirage-tcpip

ipv6 test is failing on windows

samoht opened this issue · 7 comments

$ _build/default/test/test.exe test ipv6 0
[...]
Send a UDP packet from one IPV6 stack and check it is received by another.
_build/_tests\ipv6.000.output:
test.exe: [INFO] Connected Ethernet interface 02:50:00:00:00:01
test.exe: [INFO] IP6: Starting
test.exe: [DEBUG] ND6: Sending RS
test.exe: [DEBUG] ND6: Sending NS src=:: dst=ff02::1:ff00:1 tgt=fe80::50:ff:fe00:1
test.exe: [DEBUG] no size limit, sending
test.exe: [DEBUG] no size limit, sending
test.exe: [DEBUG] ND6: Sending NS src=:: dst=ff02::1:ff00:23 tgt=fc00::23
test.exe: [DEBUG] no size limit, sending
test.exe: [INFO] UDP interface connected on
test.exe: [INFO] Connected Ethernet interface 02:50:00:00:00:02
test.exe: [INFO] IP6: Starting
test.exe: [DEBUG] ND6: Sending RS
test.exe: [DEBUG] ND6: Sending NS src=:: dst=ff02::1:ff00:2 tgt=fe80::50:ff:fe00:2
test.exe: [DEBUG] no size limit, sending
test.exe: [DEBUG] no size limit, sending
test.exe: [DEBUG] ND6: Sending NS src=:: dst=ff02::1:ff00:45 tgt=fc00::45
test.exe: [DEBUG] no size limit, sending
test.exe: [INFO] UDP interface connected on
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
test.exe: [DEBUG] ND6: Sending NS src=:: dst=ff02::1:ff00:45 tgt=fc00::45
test.exe: [DEBUG] no size limit, sending
test.exe: [DEBUG] ND: Received NS: src=:: dst=ff02::1:ff00:45 tgt=fc00::45
test.exe: [DEBUG] SLAAC: fe80::50:ff:fe00:1 --> PREFERRED
test.exe: [DEBUG] SLAAC: fc00::23 --> PREFERRED
test.exe: [DEBUG] SLAAC: fe80::50:ff:fe00:2 --> PREFERRED
test.exe: [DEBUG] SLAAC: fc00::45 --> PREFERRED
------------------------------------------------------------------------------------------------------------------------------
ASSERT UDP packet should have been received
------------------------------------------------------------------------------------------------------------------------------
Test error: Error UDP packet should have been received.
djs55 commented

This test case seems to be pure OCaml using mirage-vnetif -- I can't see any system dependency. Yet it does fail reliably on Windows for me too. (/cc @MagnusS in case you have any ideas!)

djs55 commented

Here's a diff between a working trace on Linux and a broken trace on Windows:

--- working     2017-08-04 18:03:55.000000000 +0100
+++ broken      2017-08-04 18:04:09.000000000 +0100
@@ -6,7 +6,7 @@
 test.exe: [DEBUG] no size limit, sending
 test.exe: [DEBUG] ND6: Sending NS src=:: dst=ff02::1:ff00:23 tgt=fc00::23
 test.exe: [DEBUG] no size limit, sending
-test.exe: [INFO] UDP interface connected on 
+test.exe: [INFO] UDP interface connected on
 test.exe: [INFO] Connected Ethernet interface 02:50:00:00:00:02
 test.exe: [INFO] IP6: Starting
 test.exe: [DEBUG] ND6: Sending RS
@@ -15,19 +15,12 @@
 test.exe: [DEBUG] no size limit, sending
 test.exe: [DEBUG] ND6: Sending NS src=:: dst=ff02::1:ff00:45 tgt=fc00::45
 test.exe: [DEBUG] no size limit, sending
-test.exe: [INFO] UDP interface connected on 
+test.exe: [INFO] UDP interface connected on
+test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
+test.exe: [DEBUG] ND6: Sending NS src=:: dst=ff02::1:ff00:45 tgt=fc00::45
+test.exe: [DEBUG] no size limit, sending
+test.exe: [DEBUG] ND: Received NS: src=:: dst=ff02::1:ff00:45 tgt=fc00::45
 test.exe: [DEBUG] SLAAC: fe80::50:ff:fe00:1 --> PREFERRED
 test.exe: [DEBUG] SLAAC: fc00::23 --> PREFERRED
 test.exe: [DEBUG] SLAAC: fe80::50:ff:fe00:2 --> PREFERRED
 test.exe: [DEBUG] SLAAC: fc00::45 --> PREFERRED
-test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
-test.exe: [DEBUG] ND6: Sending NS src=fc00::23 dst=ff02::1:ff00:45 tgt=fc00::45
-test.exe: [DEBUG] no size limit, sending
-test.exe: [DEBUG] ND: Received NS: src=fc00::23 dst=ff02::1:ff00:45 tgt=fc00::45
-test.exe: [DEBUG] ND6: Sending NA: src=fc00::45 dst=fc00::23 tgt=fc00::45 sol=true
-test.exe: [DEBUG] IP6: Sending packet: dst=fc00::23 mac=02:50:00:00:00:01
-test.exe: [DEBUG] no size limit, sending
-test.exe: [DEBUG] ND: Received NA: src=fc00::45 dst=fc00::23 tgt=fc00::45
-test.exe: [INFO] NUD: fc00::45 --> REACHABLE
-test.exe: [DEBUG] IP6: Releasing queued packets: dst=fc00::45 mac=02:50:00:00:00:02
-test.exe: [DEBUG] no size limit, sending

With my completely untrained eye I observe

  • the broken trace Queues the packet earlier, before the SLAAC stuff
  • the working trace has ND6: Sending NA: src=fc00::45 dst=fc00::23 tgt=fc00::45 sol=true and ten IP6: Sending packet

So it looks like (again I'm guessing here, not understanding IPv6 at all) that the broken trace has the packet sent before some internal routes are set up and the packet gets dropped? The working trace where the packet is sent a bit later after the SLAAC stuff seems to work better.

Perhaps the delays can be tweaked to work around this. It would be nice to have a way of waiting for the system to be stable rather than sleep :/

djs55 commented

I tried to make the test more reliable by

  • removing the initial delay (the one with the comment (* Duration.of_ms 500 makes this test fail - why? *))
  • sending UDP every 50ms
  • arranging for the test to succeed when the first packet is received

The idea is that we don't really know what the initial startup time needed is, before which traffic is presumably being dropped.

However this still doesn't work -- possibly related to the comment referenced above -- it seems that if one packet is sent early, no traffic will ever be received. Check out this trace:

s$ _build/default/test/test.exe test ipv6 -v
test.exe: [INFO] Connected Ethernet interface 02:50:00:00:00:01
test.exe: [INFO] IP6: Starting
test.exe: [DEBUG] ND6: Sending RS
test.exe: [DEBUG] ND6: Sending NS src=:: dst=ff02::1:ff00:1 tgt=fe80::50:ff:fe00:1
test.exe: [DEBUG] no size limit, sending
test.exe: [DEBUG] no size limit, sending
test.exe: [DEBUG] ND6: Sending NS src=:: dst=ff02::1:ff00:23 tgt=fc00::23
test.exe: [DEBUG] no size limit, sending
test.exe: [INFO] UDP interface connected on 
test.exe: [INFO] Connected Ethernet interface 02:50:00:00:00:02
test.exe: [INFO] IP6: Starting
test.exe: [DEBUG] ND6: Sending RS
test.exe: [DEBUG] ND6: Sending NS src=:: dst=ff02::1:ff00:2 tgt=fe80::50:ff:fe00:2
test.exe: [DEBUG] no size limit, sending
test.exe: [DEBUG] no size limit, sending
test.exe: [DEBUG] ND6: Sending NS src=:: dst=ff02::1:ff00:45 tgt=fc00::45
test.exe: [DEBUG] no size limit, sending
test.exe: [INFO] UDP interface connected on 
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
test.exe: [DEBUG] ND6: Sending NS src=:: dst=ff02::1:ff00:45 tgt=fc00::45
test.exe: [DEBUG] no size limit, sending
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
test.exe: [DEBUG] SLAAC: fe80::50:ff:fe00:1 --> PREFERRED
test.exe: [DEBUG] SLAAC: fc00::23 --> PREFERRED
test.exe: [DEBUG] SLAAC: fe80::50:ff:fe00:2 --> PREFERRED
test.exe: [DEBUG] SLAAC: fc00::45 --> PREFERRED
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45
Udp.write
test.exe: [DEBUG] IP6: Queueing packet: dst=fc00::45

If I put the initial delay back in, then it works as expected. I suspect the stack gets into a stuck state if traffic is sent too early.

@djs55 had a quick look at the mirage-vnetif setup. The test is using the default backend which will just deliver all sent packets in async -- so even if the callback function blocks the next packet should be delivered.

There's a race in Lwt.pick between the receiver and sender, afaik we don't have a mechanism to reliably see when the receiver is up - but 500ms seems like it should be more than enough... We use a delay in the ipv4 tcp connect test too (but a shorter one): https://github.com/mirage/mirage-tcpip/blob/master/test/test_connect.ml#L69.

djs55 commented

Thanks for taking a look @MagnusS .

When I improved the unit test to send packets in a loop (every 50ms) it turned out that, if I send a packet early, the stack never works. It seems there's a logic bug which prevents outgoing traffic ever being sent (to that address) as soon as the first packet is queued.

I've made a candidate "fix" in the linked PR, but I don't really know what I'm doing with v6 so it might be nonsense! The test case improvement patch hits the bug reliably though.

djs55 commented

OK I think I've figured this out. When sending packets to a local address, we use RFC2461's neighbor discovery. A "Neighbor Solicitation" message is sent too early and is dropped -- this is perfectly ok. It is supposed to be resent after a "Retrans Timer" timeout but the initial INCOMPLETE state is accidentally set with a timeout based on the "Reachable Time" timeout. The "Retrans Timer" is the time between probes and is about 1s; the "Reachable Time" timeout is the time for which to assume a confirmed neighbor cache entry is still valid and is about 30s. The code did correctly use the "Retrans Timer" timeout for second and subsequent probes, it's just the initial timeout which looks wrong (probably a typo).

nojb commented

It's been a while since I worked on this, but this sounds about right to me.