canonical/microcloud

`microcloud init` does not perform sanity checks on the uplink network input

Closed this issue · 4 comments

microcloud init does not perform many sanity checks on the inputs for configuring the OVN uplink network. I can identify two cases where the user can easily make a mistake while entering the network configuration, which causes the init to fail at the last stage, requiring that the user starts again.

Case 1: first and last addresses are outside the gateway's subnet

$ sudo microcloud init
Waiting for LXD to start...
Using address "10.42.0.49" for MicroCloud
Limit search for other MicroCloud servers to 10.42.0.49/24? (yes/no) [default=yes]:
Scanning for eligible servers ...

 Selected "orangepi-1" at "10.42.0.198"
 Selected "orangepi-2" at "10.42.0.49"
 Selected "orangepi-3" at "10.42.0.217"

Would you like to set up local storage? (yes/no) [default=yes]: no
Would you like to set up distributed storage? (yes/no) [default=yes]:
Select from the available unpartitioned disks:

Select which disks to wipe:

 Using 1 disk(s) on "orangepi-2" for remote storage pool
 Using 1 disk(s) on "orangepi-3" for remote storage pool
 Using 1 disk(s) on "orangepi-1" for remote storage pool

Configure distributed networking? (yes/no) [default=yes]:
Select exactly one network interface from each cluster member:

 Using "enP4p65s0" on "orangepi-2" for OVN uplink
 Using "enP4p65s0" on "orangepi-3" for OVN uplink
 Using "enP4p65s0" on "orangepi-1" for OVN uplink

Specify the IPv4 gateway (CIDR) on the uplink network (empty to skip IPv4): 10.1.1.1/24
Specify the first IPv4 address in the range to use with LXD: 10.2.2.50
Specify the last IPv4 address in the range to use with LXD: 10.2.2.100
Specify the IPv6 gateway (CIDR) on the uplink network (empty to skip IPv6):
Initializing a new cluster
 Local MicroCloud is ready
 Local LXD is ready
 Local MicroOVN is ready
 Local MicroCeph is ready
Awaiting cluster formation ...
 Peer "orangepi-3" has joined the cluster
 Peer "orangepi-1" has joined the cluster
Configuring cluster-wide devices ...
Error: Failed allocating uplink port IPs on network "UPLINK": Failed to parse uplink IPv4 OVN ranges: IP range "10.2.2.50-10.2.2.100" does not fall within any of the allowed networks [10.1.1.0/24]

The first and last addresses should maybe be validated to ensure that they are within the gateway's subnet.

Case 2: uplink network is in the same subnet as the management network (here the management network has the subnet 10.0.0.0/8 and the uplink network is configured to 10.254.254.0/24)

$ sudo microcloud init
Waiting for LXD to start...
Using address "10.42.0.49" for MicroCloud
Limit search for other MicroCloud servers to 10.42.0.49/8? (yes/no) [default=yes]:
Scanning for eligible servers ...

 Selected "orangepi-2" at "10.42.0.49"
 Selected "orangepi-3" at "10.42.0.217"
 Selected "orangepi-1" at "10.42.0.198"

Would you like to set up local storage? (yes/no) [default=yes]: no
Would you like to set up distributed storage? (yes/no) [default=yes]:
Select from the available unpartitioned disks:

Select which disks to wipe:

 Using 1 disk(s) on "orangepi-2" for remote storage pool
 Using 1 disk(s) on "orangepi-3" for remote storage pool
 Using 1 disk(s) on "orangepi-1" for remote storage pool

Configure distributed networking? (yes/no) [default=yes]:
Select exactly one network interface from each cluster member:

 Using "enP4p65s0" on "orangepi-2" for OVN uplink
 Using "enP4p65s0" on "orangepi-1" for OVN uplink
 Using "enP4p65s0" on "orangepi-3" for OVN uplink

Specify the IPv4 gateway (CIDR) on the uplink network (empty to skip IPv4): 10.254.254.1
Invalid input: invalid CIDR address: 10.254.254.1

Specify the IPv4 gateway (CIDR) on the uplink network (empty to skip IPv4): 10.254.254.1/24
Specify the first IPv4 address in the range to use with LXD: 10.254.254.10
Specify the last IPv4 address in the range to use with LXD: 10.254.254.254
Specify the IPv6 gateway (CIDR) on the uplink network (empty to skip IPv6):
Initializing a new cluster
 Local MicroCloud is ready
 Local LXD is ready
 Local MicroOVN is ready
 Local MicroCeph is ready
Awaiting cluster formation ...
 Peer "orangepi-1" has joined the cluster
 Peer "orangepi-3" has joined the cluster
Configuring cluster-wide devices ...
Error: Failed generating auto config: Failed to automatically find an unused IPv4 subnet, manual configuration required

The first and last addresses should maybe be validated to ensure that they are within the gateway's subnet.

I agree we should do this.

Case 2: uplink network is in the same subnet as the management network (here the management network has the subnet 10.0.0.0/8 and the uplink network is configured to 10.254.254.0/24)

@masnax yes lets do the same pre-check that LXD does in microcloud to allow earlier detection & correction, and then microcloud can provide the picked subnet directly to LXD.

@MggMuggins So for this one, I think the best place to put a validation check is at the end of the config collecting process just before we set up the cluster, so that the validation is consistent for both interactive and preseed initialization.

How this works in microcloud is that we carry a map[string]InitSystem throughout the initialization process, filling it with config for each potential node in the cluster as we go through interactive init, or all at once for preseed. Then we call setupCluster to actually initialize the daemons and set up clustering.

You can probably add a validateConfig or validateSystems function just before we call setupCluster in the 2 places we call it. For now, we can just include a check here to make sure that the selected IP ranges fit within the selected gateway. The network config is stored in InitSystem.Networks, populated with the return value of DefaultOVNNetwork.

Since we only apply the config keys when first setting up the cluster, you would need to check if we are bootstrapping (should be the case if the local node is included in the InitSystem map. We usually check that like this). There may be other network structs in the InitSystem.Networks slice, so you can check if the Type is physical and the Name is UPLINK to determine if you need to validate anything.

The actual validation should just be like this for now, we can grow the validation steps as needed in the future:

  • If network.Config[ipv4.gateway] is present, network.Config[ipv4.ovn.ranges] needs to either not exist or fit within the subnet specified by ipv4.gateway. LXD uses this helper to make the same validation so you can make it exportable and move it to the shared package so we can call it from microcloud.

  • If either network.Config[ipv4.gateway] or network.Config[ipv6.gateway] is present, it must not collide with any addresses of any system in the InitSystem map. Their addresses are stored at InitSystem.ServerInfo.Address, except for the local node, which is s.Address.

Case 1: first and last addresses are outside the gateway's subnet

This should certainly be checked for (in both MicroCloud and LXD and if not already).

Case 2: uplink network is in the same subnet as the management network (here the management network has the subnet 10.0.0.0/8 and the uplink network is configured to 10.254.254.0/24)

This configuration is valid. It should be allowed to use the same or different subnets for the microcloud and ovn networks (after all the OVN uplink network interface may be connected to the same or different external network as the microcloud network interfac).

We should check that none of the OVN external IPv4 ranges cover the uplink gateway address, external DNS addresses or any of the microcloud member IPs (that's asking for trouble).

But the problem the OP is experiencing in case 2:

Error: Failed generating auto config: Failed to automatically find an unused IPv4 subnet, manual configuration required

is unrelated to the differing subnets and more likely because their host has a the entire 10.0.0.0 - 10.255.255.255 range in use on the microcloud interface (we can tell this because of the question Limit search for other MicroCloud servers to 10.42.0.49/8 the /8 is key here).

LXD is using this function to find a free OVN subnet:

https://github.com/canonical/lxd/blob/c8bc5de197b0cf7ea9843fda6ecb80cda10b4f57/lxd/network/network_utils.go#L598-L615

And this is considering any subnet in the host's routing table as unavailable. And thus it cannot find a free /24 in the 10/8 prefix.

So I would suggest creating a separate issue over at LXD's repo for the case 2 and then case 1 (and the other suggestions here) can be implemented in microcloud.