AMS-IX/arpsponge

Inadvertently preventing windows boot if no IP set

Closed this issue · 8 comments

Hi!

We are still using arpsponge in a non-IX environment, to protect a firewall from having to issue lots of ARPs for hosts not responding. It handles those events poorly.

We have for a long time had problems on some networks with Windows servers. The servers fail to bring up interfaces or boot when arpsponge is active (IP set to dead state), and claims IP conflict. This has puzzled me for a long time, and I have not been able to figure it out.

Today I suddenly realized what the problem is, but I could use some help to devise the most proper workaround/fix. Ideally something that could be included in the standard distribution without any side-effects for anyone else.

Scenario: I'm running arpsonge on a server that has a trunk port with all VLANs accessible. For those networks/VLANs where we have problems, I create a sub interface and attach an arpsponge session to it. I don't set an IP address on the interface! (Sometimes there are no free IP addresses, and we wanted to be able to be as non-intrusive as possible)

Observation 1: When a windows server boots or brings an interface up, it usually emits 3 ARP probes before committing to its address, and then sends an ordinary ARP announcement (gARP request).

Observation 2: When arpsponge thinks the IP address is dead, it will on reception of the first ARP probe put the address in PENDING state and start emitting ordinary ARP requests.

Observation 3: When arpsponge starts emitting ARP requests from the PENDING state, the host stops sending its ARP probes; we never see more than one ARP probe, and the host reports an IP conflict.

Realization: If you intend to emit an ordinary ARP request, but set your IP address to 0.0.0.0, you have in fact created an ARP probe packet!

RFC 5227, section 2.2.1, fourth paragraph explains what happens. Microsoft has implemented this SHOULD section, and immediately gives up on bringing the interface up.

What would be the least intrusive way to try to work around this?

Perhaps just conclude that it is not productive to try to emit ARP requests if no IP address is assigned? If that is added, the ARP probes during the PENDING state will just not be emitted, the host interface will be activated and an ARP announcement be sent, and everyone is happy. Would that change have any ill effects?

/Per

PS: This is (so far) not a problem for the Linux servers around, they just bring their interfaces up!

Hi Per,
Sorry for not responding sooner. Been dealing with health issues the past couple of months.
Interesting use for the arpsponge :-)
Thanks for the analysis. We should of course never send ARP requests if the interface has no IP address, but we should perhaps go a step further and not respond to ARP queries with an all-zero sender IP address. I'll have a look to see what impact that would have.

Sure, If you can send me a diff that would be appreciated!

Change made to ./lib/M6/ARP/Sponge.pm, in sub send_arp. Not a complete diff, instead relative to just that function/sub:

*** old 2019-03-20 08:41:36.504676744 +0100
--- new 2019-03-20 08:41:04.880159830 +0100
***************
*** 15,20 ****
--- 15,27 ----
      $args{dest_mac} //= $args{tha};
      $args{opcode}   //= $ARP_OPCODE_REQUEST;

+     # If "spa" is 0.0.0.0, the ARP request/response turns into a ARP probe/announce, not what is intended.
+     # This can happen if you are using an interface without an IP address set.
+     if ($args{spa} == 0) {
+         event_notice(EVENT_SPONGE, "send_arp: spa=0.0.0.0, aborting packet with tpa=%s", hex2ip($args{tpa}));
+         return;
+     }
+
      my $pkt = encode_ethernet({
                      dest_mac => $args{tha},
                      src_mac  => $args{src_mac},

Right now I have finally made the organisation I work with allocate IP addresses for arpsponge to use in all monitored networks. This will soon be a no-problem for us.

We ran into a very bad situation in a network with loadbalancers with many VIPs. If an LB temporary outage triggered arpsponge to declare a VIP as dead, it would never recover. The VIP would never source any broaqdcast or multicast traffic, so it was invisible. The ARP requests necessary to find the default gateways would be sourced from the LB native address, never its VIPs!

Thanks for that. I'll see if I can incorporate something like that.

I agree that if the sponge has no IP itself, it should never send queries (which would be probes)., because it can cause race conditions when the target host is coming up and doing its own duplicate IP detection.

Maybe add an --passive flag:

  • Without --passive:
    • The sponge will refuse to start if the interface has no IP address.
  • With --passive:
    • The sponge will not send any queries or probes.
    • Periodic sweeping is disabled.
    • The PENDING state simply becomes a timer before sponging.

In --active mode, when a probe is received, the tpa's state will be (re)set to PENDING(0) if the address is not known to be ALIVE. With an appropriate value for --pending, this should prevent the sponge from re-sponging. The alternative is to set the tpa's state to ALIVE immediately. Not sure if that is better... What do you think?

I've just pushed changes to a new branch, feature/passive-state. Passive mode is turned on automatically if the sponge's network i/f has no IP address. It will log warnings periodically, though. To get rid of the warnings, the sponge should be started with an explicit --passive. In --passive mode, the periodic sweeping is disabled, and the PENDING state turns into a simple timer of ---pending=n seconds.

In non-passive mode, everything stays as is.