'load balancers' are typically the bane of all system administrators as when they are deployed more often for their advertised failover capabilities you quickly find that the medicine is worse than the disease. Load balancers rarely are very RFC-caring and of course ironically create single points of failure. Their functioning results in a lot more complexity being added to whatever system they are meant to be making highly available which of course leads to fails modes that are far more severe when things do crash and burn. In short, load balancers are a prime example of what is wrong with sysadmin'ing mindset now-a-days, instead of building a system that gracefully fails and expects componment failure, all resources are spent on making sure nodes do not fail.
Anycast'ing is a great method to add failover to a system that uses an IGP such as OSPF to get client traffic to the nearest service node. IGP's are widely understood and easy to diagnose (compared to a vendor turn-key 'blackbox'), if you already run one at your organisation then this could be a perfect and cheap solution for you.
As with all High Availability (HA) systems, everything boils down to a good and simple probe that tests if the service on the local node is functioning correctly. Often surprising is that you will find 90% of your time will be spent producing nothing more than a sixty line shell script that performs just this job.
The job of the, in the below DNS resolver example, shell script is to test if the local resolver is functioning. If so then the 'service' IP address is added to the node which prompts the IGP daemon to begin advertising the address to the network. If the service is down, then the service IP is removed and the IGP daemon withdraws the advertisment and traffic to that service IP is rerouted to the next nearest functioning node. This is great for unreliable services as you can very simply get very reliable systems by creating duplicate service nodes. For example, say your DNS server only functions 80% of the time (down for 20%), by having four DNS servers you get a service that gives you nearly three nines uptime (''1-(0.20^4) = 0.9984'').
N.B. it is strongly recommended you have at least three L3 IP routing hops (ie. traceroute has more than three entries) between your anycast'ed nodes so that you avoid your OSPF topology settling on an equal-cost-multipath routing table. This means that a router on your network sees that the path to your anycast'ed address is the same across different links and will load balance on a per-packet basis (//very// bad for TCP flows, DNS is TCP also remember especially with DNSSEC). If you cannot arrange your network to have at least three hops, then increase the routing metric on the directly connected link to one of your anycast'ed nodes to act as a work around.
Below we give an example of configuring a single service node (quagga capable host) to provide a DNS resolver for clients of the network. The node communicates with it's upstream Cisco router (the example shown works on a C6500 and also a C3750 running 'IP base'); do send me configuration snippets for other routers. The subnet the service node and router lives on 198.51.100.0/30 and 2001:db8:ffff:1000::/64 whilst the service IP's are 192.0.2.1 and 2001:db8:ffff:a000:fdc0::3b5.
Tinker with '/etc/quagga/daemons' and then:
# cat `<<EOF >` /etc/quagga/zebra.conf
password foobar
EOF
# chmod 640 /etc/quagga/zebra.conf
# chown quagga:quagga /etc/quagga/zebra.conf
# cat `<<EOF >` /etc/quagga/ospfd.conf
password foobar
!
interface bond0
ip ospf hello-interval 5
ip ospf dead-interval 20
!
router ospf
ospf router-id 198.51.100.2
log-adjacency-changes
redistribute connected
network 192.0.2.1/32 area 0.0.0.1
network 198.51.100.0/30 area 0.0.0.0
distribute-list 50 out connected
!
access-list 50 permit 192.0.2.1
!
line vty
EOF
# chmod 640 /etc/quagga/ospfd.conf
# chown quagga:quagga /etc/quagga/ospfd.conf
# cat `<<EOF >` /etc/quagga/ospf6d.conf
password foobar
!
debug ospf6 lsa unknown
!
interface bond0
ipv6 ospf6 cost 1
ipv6 ospf6 hello-interval 5
ipv6 ospf6 dead-interval 20
ipv6 ospf6 retransmit-interval 5
ipv6 ospf6 priority 1
ipv6 ospf6 transmit-delay 1
ipv6 ospf6 instance-id 0
!
router ospf6
router-id 198.51.100.2
redistribute connected route-map filter
area 0.0.0.0 range 2001:db8:ffff:1000::/64
area 0.0.0.1 range 2001:db8:ffff:a000::/64
interface bond0 area 0.0.0.0
!
ipv6 prefix-list dns seq 5 permit 2001:db8:ffff:a000:fdc0::3b5/128
!
route-map filter permit 10
match ipv6 address prefix-list dns
!
route-map filter deny 20
!
line vty
EOF
# chmod 640 /etc/quagga/ospf6d.conf
# chown quagga:quagga /etc/quagga/ospf6d.conf
ip access-list standard ospfv4-dns
permit 192.0.2.1
!
router ospf 30000
router-id 198.51.100.1
log-adjacency-changes
passive-interface default
no passive-interface Port-channel1
network 192.0.2.1 0.0.0.1 area 1
network 198.51.100.0 0.0.0.3 area 0
distribute-list ospfv4-dns in
!
!
ipv6 prefix-list ospfv6-dns seq 5 permit 2001:DB8:FFFF:A000:FDC0::3B5/128
!
ipv6 router ospf 30000
router-id 198.51.100.1
log-adjacency-changes
area 0 range 2001:DB8:FFFF:1000::/64
area 1 range 2001:DB8:FFFF:A000::/64
distribute-list prefix-list ospfv6-dns in
passive-interface default
no passive-interface Port-channel1
!
!
interface Port-channel1
description dns-server
no switchport
ip address 198.51.100.1 255.255.255.252
ip ospf hello-interval 5
ipv6 address FE80:: link-local
ipv6 address EXAMPLE 2001:db8:ffff:1000::/64 anycast
ipv6 nd router-preference High
ipv6 ospf hello-interval 5
ipv6 ospf 30000 area 0
end
It is recommended that you use your global IGP instance to do this all with, but for those cursed with legacy EIGRP deployments (or using iBGP/ISIS and want to decouple voliate anycast'ed nodes) you will need to run a separate routing process and then 'redistribute' these new OSPF processes into your global process, for example:
router eigrp 1234
redistribute ospf 30000 metric 1 1000 255 1 1500
ipv6 router ospf 1234
redistribute ospf 30000
As mentioned earlier, the tricky part is the probe script, I have made available the ones I developed and use at for our organisation:
dns-probe
- recursive servers onlyradius-probe
syslog-probe
- syslog-ng with netcat6squid-probe
tcp-route-probe
It is simply a case of dropping the script into '/usr/local/sbin/' and running it as root. To automatically start the script at boot time, add something like the following line to '/etc/rc.local':
nohup /usr/local/sbin/dns-probe || true
The scripts support signal handling too:
SIGTERM
: remove HA IP address(es) and exit cleanlySIGTSTP
: remove HA IP address(es) and send ''SIGSTOP'' to self- TODO: deadvertise the service by persuading Quagga to increase the routing metric first to 65535 and then after a period of time to remove the service IP. On second thought this is more than likely not approiate as your node is an endpoint rather than a conduit so the effect would be the same as simply tearing down the IP
- ''SIGCONT'': informs the kernel to resume the process and the probes will start afresh
This probe/IGP approach is very helpful when you need to perform maintainance on your nodes, such as needs to be done in my Protecting Users with DNS Malware Blacklisting by the regular cron job described there. An example of this is when 'unbound' restarts it initially takes some time to get going, especially so on some low end hardware, whilst pre-loading the 16000 domains that need hijacking. To hide this delay is trivial to mask with anycast'ing:
- 'suspend' the probe and remove the service IP's (''pkill -TSTP dns-probe'')
- perform the maintainence work (''/etc/init.d/unbound restart'')
- resume the probe and have it add the service IP's (''pkill -CONT dns-probe'')
In this case, the cron job (''/etc/cron.d/local-unbound'') is amended as follows:
#MAILTO=hostmaster
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
# pre-ha enabled
#17 0-23/6 * * * root nice -n10 blacklist2dns && nice -n5 unbound-checkconf && /etc/init.d/unbound restart
# ha enabled
17 0-23/6 * * * root nice -n10 blacklist2dns && nice -n5 unbound-checkconf && pkill -TSTP dns-probe && /etc/init.d/unbound restart && pkill -CONT dns-probe
# ha enabled utilising http://habilis.net/cronic/
# 17 0-23/6 * * * root cronic sh -c 'nice -n10 blacklist2dns; RC=$?; [ $RC -eq 1 ] && exit 0; [ $RC -gt 1 ] && exit $RC; nice -n5 unbound-checkconf && pkill -TSTP dns-probe && /etc/init.d/unbound restart && pkill -CONT dns-probe'
N.B. work in progress
server viewpoint {
listen {
type = status
port = 18120
ipaddr = 127.0.0.1
interface = lo
}
client 127.0.0.1 {
shortname = localhost
secret = foobar
}
authorize {
ok
# respond to the Status-Server request.
Autz-Type Status-Server {
ok
}
}
}
N.B. work in progress
source s_probe { udp(ip(127.0.0.1) port(5514)); };
destination d_probe { udp("127.0.0.1" port(5515) flush_lines(0)); };
log { source(s_probe); destination(d_probe); };
N.B. work in progress
ip access-list standard ospfv4-proxy
permit 193.63.73.35
permit 193.63.73.34
permit 193.63.73.33
ip access-list standard ospfv4-proxy-remote
permit 193.63.73.34
!
route-map ospfv4-proxy permit 10
match ip address ospfv4-proxy-remote
set metric 10000 4000 255 1 1500
route-map ospfv4-proxy permit 20
set metric 20000 2000 255 1 1500
router ospf 30002
router-id 212.219.238.13
passive-interface default
no passive-interface Port-channel7
network 193.63.73.33 0.0.0.0 area 0.0.0.1
network 193.63.73.34 0.0.0.0 area 0.0.0.1
network 193.63.73.35 0.0.0.0 area 0.0.0.1
network 212.219.238.12 0.0.0.3 area 0
distribute-list ospfv4-proxy in
ipv6 prefix-list ospfv6-proxy seq 10 permit 2001:630:1B:1001:FCE9:4633:2F26:4A5B/128
ipv6 prefix-list ospfv6-proxy seq 20 permit 2001:630:1B:1001:3B30:4100:D401:2A97/128
ipv6 prefix-list ospfv6-proxy seq 30 permit 2001:630:1B:1001:A0D3:5619:9943:8390/128
ipv6 prefix-list ospfv6-proxy-remote seq 10 permit 2001:630:1B:1001:3B30:4100:D401:2A97/128
!
route-map ospfv6-proxy permit 10
match ipv6 address prefix-list ospfv6-proxy-remote
set metric 200
route-map ospfv6-proxy permit 20
set metric 10
ipv6 router ospf 30002
router-id 212.219.238.13
area 0 range 2001:630:1B:6007::/64
area 0.0.0.1 range 2001:630:1B:1001:3B30:4100:D401:2A97/128
area 0.0.0.1 range 2001:630:1B:1001:A0D3:5619:9943:8390/128
area 0.0.0.1 range 2001:630:1B:1001:FCE9:4633:2F26:4A5B/128
distribute-list prefix-list ospfv6-proxy in
passive-interface default
no passive-interface Port-channel7
router eigrp 111
redistribute ospf 30002 route-map ospfv4-proxy
ipv6 router ospf 10
redistribute ospf 30002 route-map ospfv6-proxy
chinua# show run
Current configuration:
!
password foobar
!
interface bond0
ip ospf hello-interval 5
ip ospf dead-interval 20
!
interface eth0
!
interface eth1
!
interface lo
!
router ospf
ospf router-id 212.219.238.14
log-adjacency-changes
redistribute connected
passive-interface default
no passive-interface bond0
network 193.63.73.33/32 area 0.0.0.1
network 193.63.73.34/32 area 0.0.0.1
network 193.63.73.35/32 area 0.0.0.1
network 212.219.238.12/30 area 0.0.0.0
distribute-list 50 out connected
default-information originate
!
access-list 50 permit 193.63.73.33
access-list 50 permit 193.63.73.34
access-list 50 permit 193.63.73.35
!
line vty
!
end
A slightly different problem to be solved here. SOAS has a handful of IPsec tunnels and to monitor the services at the far end of the tunnel could only realistically be done (for braindead reasons and constraints placed on us by the venduh) with netcat in 'scanning mode' ('-z').
Another amendment is that we are advertising now kernel installed routes and not connected IP's and/or subnets:
router ospf
ospf router-id 198.51.100.2
log-adjacency-changes
redistribute kernel
network 192.0.2.153/32 area 0.0.0.1
network 192.0.2.201/32 area 0.0.0.1
network 192.0.2.211/32 area 0.0.0.1
[snipped]
network 198.51.100.0/30 area 0.0.0.0
distribute-list 50 out kernel
The probe initialising looks like:
## northgate probes
# greenford
nohup /usr/local/sbin/tcp-route-probe bond0 192.0.2.153 80 443 || true
#nohup /usr/local/sbin/tcp-route-probe bond0 192.0.2.201 22 23 992 || true
#nohup /usr/local/sbin/tcp-route-probe bond0 192.0.2.211 1521 1526 || true
# sqp
nohup /usr/local/sbin/tcp-route-probe bond0 198.51.100.153 80 443 || true
#nohup /usr/local/sbin/tcp-route-probe bond0 198.51.100.201 22 23 992 || true
#nohup /usr/local/sbin/tcp-route-probe bond0 198.51.100.211 1521 1526 || true
If all the TCP ports are open, a three way TCP handshake can take place, then the script will add a route to that IP in the kernel via the interface specified on the command line ('bond0' in the above examples). Traffic flow could be considered a little weird as client traffic comes in over 'bond0', then is NAT'ed, IPsec'd and sent back out over 'bond0'. This works as the far end tunnel endpoint is not routed to the onsite local node. No doubt a unique problem experienced by few people, but the script it's-self no doubt will be useful to others wishing to roll their own probe.