Inadvertently preventing windows boot if no IP set

Question

Inadvertently preventing windows boot if no IP set

Closed this issue 6 years ago · 8 comments

Hi!

We are still using arpsponge in a non-IX environment, to protect a firewall from having to issue lots of ARPs for hosts not responding. It handles those events poorly.

We have for a long time had problems on some networks with Windows servers. The servers fail to bring up interfaces or boot when arpsponge is active (IP set to dead state), and claims IP conflict. This has puzzled me for a long time, and I have not been able to figure it out.

Today I suddenly realized what the problem is, but I could use some help to devise the most proper workaround/fix. Ideally something that could be included in the standard distribution without any side-effects for anyone else.

Scenario: I'm running arpsonge on a server that has a trunk port with all VLANs accessible. For those networks/VLANs where we have problems, I create a sub interface and attach an arpsponge session to it. I don't set an IP address on the interface! (Sometimes there are no free IP addresses, and we wanted to be able to be as non-intrusive as possible)

Observation 1: When a windows server boots or brings an interface up, it usually emits 3 ARP probes before committing to its address, and then sends an ordinary ARP announcement (gARP request).

Observation 2: When arpsponge thinks the IP address is dead, it will on reception of the first ARP probe put the address in PENDING state and start emitting ordinary ARP requests.

Observation 3: When arpsponge starts emitting ARP requests from the PENDING state, the host stops sending its ARP probes; we never see more than one ARP probe, and the host reports an IP conflict.

Realization: If you intend to emit an ordinary ARP request, but set your IP address to 0.0.0.0, you have in fact created an ARP probe packet!

RFC 5227, section 2.2.1, fourth paragraph explains what happens. Microsoft has implemented this SHOULD section, and immediately gives up on bringing the interface up.

What would be the least intrusive way to try to work around this?

Perhaps just conclude that it is not productive to try to emit ARP requests if no IP address is assigned? If that is added, the ARP probes during the PENDING state will just not be emitted, the host interface will be activated and an ARP announcement be sent, and everyone is happy. Would that change have any ill effects?

/Per

PS: This is (so far) not a problem for the Linux servers around, they just bring their interfaces up!

Answer 1 · 2019-03-19T20:48:30.000Z

Hi Per,
Sorry for not responding sooner. Been dealing with health issues the past couple of months.
Interesting use for the arpsponge :-)
Thanks for the analysis. We should of course never send ARP requests if the interface has no IP address, but we should perhaps go a step further and not respond to ARP queries with an all-zero sender IP address. I'll have a look to see what impact that would have.

Answer 2 · 2019-03-19T20:56:00.000Z

I did implement this a while ago, and I could see that some windows boxes succeeded in booting after that change. I can share the fix I used if you like (at home right now, cannot easily reach the right systems). ARP query responses should not be impacted by this (if I remember correctly), if we sponge an IP address we always use that IP address in the answer and not the native address. /Per

…

On 19 Mar 2019, at 21:48, Steven Bakker wrote: Hi Per, Sorry for not responding sooner. Been dealing with health issues the past couple of months. Interesting use for the arpsponge :-) Thanks for the analysis. We should of course never send ARP requests if the interface has no IP address, but we should perhaps go a step further and not respond to ARP queries with an all-zero sender IP address. I'll have a look to see what impact that would have. -- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: #3 (comment)

Answer 3 · 2019-03-19T20:57:39.000Z

Sure, If you can send me a diff that would be appreciated!

Answer 4 · 2019-03-19T20:59:47.000Z

Will try to do that tomorrow, /Per

…

19 mars 2019 kl. 21:57 skrev Steven Bakker ***@***.***>: Sure, If you can send me a diff that would be appreciated! — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

Answer 5 · 2019-03-20T07:58:56.000Z

Change made to ./lib/M6/ARP/Sponge.pm, in sub send_arp. Not a complete diff, instead relative to just that function/sub:

*** old 2019-03-20 08:41:36.504676744 +0100
--- new 2019-03-20 08:41:04.880159830 +0100
***************
*** 15,20 ****
--- 15,27 ----
      $args{dest_mac} //= $args{tha};
      $args{opcode}   //= $ARP_OPCODE_REQUEST;

+     # If "spa" is 0.0.0.0, the ARP request/response turns into a ARP probe/announce, not what is intended.
+     # This can happen if you are using an interface without an IP address set.
+     if ($args{spa} == 0) {
+         event_notice(EVENT_SPONGE, "send_arp: spa=0.0.0.0, aborting packet with tpa=%s", hex2ip($args{tpa}));
+         return;
+     }
+
      my $pkt = encode_ethernet({
                      dest_mac => $args{tha},
                      src_mac  => $args{src_mac},

Right now I have finally made the organisation I work with allocate IP addresses for arpsponge to use in all monitored networks. This will soon be a no-problem for us.

We ran into a very bad situation in a network with loadbalancers with many VIPs. If an LB temporary outage triggered arpsponge to declare a VIP as dead, it would never recover. The VIP would never source any broaqdcast or multicast traffic, so it was invisible. The ARP requests necessary to find the default gateways would be sourced from the LB native address, never its VIPs!

Answer 6 · 2019-03-20T10:50:11.000Z

Thanks for that. I'll see if I can incorporate something like that.

I agree that if the sponge has no IP itself, it should never send queries (which would be probes)., because it can cause race conditions when the target host is coming up and doing its own duplicate IP detection.

Maybe add an --passive flag:

Without --passive:
- The sponge will refuse to start if the interface has no IP address.
With --passive:
- The sponge will not send any queries or probes.
- Periodic sweeping is disabled.
- The PENDING state simply becomes a timer before sponging.

In --active mode, when a probe is received, the tpa's state will be (re)set to PENDING(0) if the address is not known to be ALIVE. With an appropriate value for --pending, this should prevent the sponge from re-sponging. The alternative is to set the tpa's state to ALIVE immediately. Not sure if that is better... What do you think?

Answer 7 · 2019-03-25T21:30:04.000Z

Much nicer solution that my quick hack! I like the idea to add an explicit —passive flag to enable the behaviour that I have been exploring the better part of the last year, much cleaner. Also, make sure that the new flag —active is implicitly enabled to not change current behaviour. In —active mode, please keep the current behaviour of transitioning via PENDING before reaching ALIVE (unless already alive, of course). I can imagine several situations where changing directly to ALIVE from SPONGING would be wrong in the sense that the probe can be a false positive, delaying the time to return to the correct SPONGING state. /Per

…

On 20 Mar 2019, at 11:50, Steven Bakker wrote: Thanks for that. I'll see if I can incorporate something like that. I agree that if the sponge has no IP itself, it should never send queries (which would be probes)., because it can cause race conditions when the target host is coming up and doing its own duplicate IP detection. Maybe add an `--passive` flag: * Without `--passive`: * The sponge will refuse to start if the interface has no IP address. * With `--passive`: * The sponge will not send any queries or probes. * Periodic sweeping is disabled. * The PENDING state simply becomes a timer before sponging. In `--active` mode, when a probe is received, the tpa's state will be (re)set to PENDING(0) if the address is not known to be ALIVE. With an appropriate value for `--pending`, this should prevent the sponge from re-sponging. The alternative is to set the tpa's state to ALIVE immediately. Not sure if that is better... What do you think? -- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: #3 (comment)

Answer 8 · 2019-03-28T11:27:17.000Z

I've just pushed changes to a new branch, feature/passive-state. Passive mode is turned on automatically if the sponge's network i/f has no IP address. It will log warnings periodically, though. To get rid of the warnings, the sponge should be started with an explicit --passive. In --passive mode, the periodic sweeping is disabled, and the PENDING state turns into a simple timer of ---pending=n seconds.

In non-passive mode, everything stays as is.