openstreetmap/operations

No redundancy on private network site gateways

Opened this issue · 2 comments

The private part of the network currently runs over wireguard VPNs between "site gateways": Ironbelly in Amsterdam, fafnir in Dublin, Ridley in UCL.

The private network runs:

  • Monitoring through Prometheus
  • Database replication
  • OOB management
  • PDU management
  • BootP

There are two subnets on the private network: one for machines configured in chef, and one 'default' for unknown machines.

The site gateways also run VPN endpoints for remote access for the sysadmins.

There is currently no redundancy for the site gateways.

There is a preference towards keeping the config in chef (how often does it change?)

The private network currently runs in RFC1914 space, so no extra firewalling is needed.

UCL doesn't run IPv6, and some of the OOB systems might not support it anyway.

My preference here would to use a simple keepalived setup with a VIP as the internal network gateway.

keepalived.conf on primary host:

global_defs {
   max_auto_priority
   vrrp_version 3
}

vrrp_instance VI_1 {
    state MASTER
    interface bond0 # internal
    virtual_router_id 51
    priority 100
    advert_int 1
    virtual_ipaddress {
        10.0.48.1 # keepalived managed gateway VIP
    }
}

keepalived.conf on secondary host(s):

global_defs {
   max_auto_priority
   vrrp_version 3
}

vrrp_instance VI_1 {
    state BACKUP
    interface bond0 # internal
    virtual_router_id 51
    priority 50 # Lower
    advert_int 1
    virtual_ipaddress {
        10.0.48.1 # keepalived managed gateway VIP
    }
}

DHCP requires some setup too.

Wireguard tunnels would also require some work, but I've not looked into that yet.

I've done keepalived before so I've got configs I can crib for that. The dhcp is easy - it just needs a second instance with an extra dynamic pool which at least one site already has.

Wireguard is tricky - so long as there is traffic both ways and only end fails it should be OK but handling two sites failing over is hard.