No redundancy on private network site gateways
Opened this issue · 2 comments
The private part of the network currently runs over wireguard VPNs between "site gateways": Ironbelly in Amsterdam, fafnir in Dublin, Ridley in UCL.
The private network runs:
- Monitoring through Prometheus
- Database replication
- OOB management
- PDU management
- BootP
There are two subnets on the private network: one for machines configured in chef, and one 'default' for unknown machines.
The site gateways also run VPN endpoints for remote access for the sysadmins.
There is currently no redundancy for the site gateways.
There is a preference towards keeping the config in chef (how often does it change?)
The private network currently runs in RFC1914 space, so no extra firewalling is needed.
UCL doesn't run IPv6, and some of the OOB systems might not support it anyway.
My preference here would to use a simple keepalived setup with a VIP as the internal network gateway.
keepalived.conf
on primary host:
global_defs {
max_auto_priority
vrrp_version 3
}
vrrp_instance VI_1 {
state MASTER
interface bond0 # internal
virtual_router_id 51
priority 100
advert_int 1
virtual_ipaddress {
10.0.48.1 # keepalived managed gateway VIP
}
}
keepalived.conf on secondary host(s):
global_defs {
max_auto_priority
vrrp_version 3
}
vrrp_instance VI_1 {
state BACKUP
interface bond0 # internal
virtual_router_id 51
priority 50 # Lower
advert_int 1
virtual_ipaddress {
10.0.48.1 # keepalived managed gateway VIP
}
}
Wireguard tunnels would also require some work, but I've not looked into that yet.
I've done keepalived before so I've got configs I can crib for that. The dhcp is easy - it just needs a second instance with an extra dynamic pool which at least one site already has.
Wireguard is tricky - so long as there is traffic both ways and only end fails it should be OK but handling two sites failing over is hard.