Upgrading a cluster does not honor all template variables
Closed this issue · 1 comments
The upgrade process for magnum clusters does not fully upgrade all the configuration options specified by the cluster template that is being applied. Specifically, we noticed this on the sizing of the nodes, as shown below, but this idea should be applied to all the configuration items within a template.
Steps to replicate:
first, look at the original cluster template with the medium flavor and 40gb boot volume size
openstack coe cluster template show test-template-medium-flavor
+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| insecure_registry | - |
| labels | {'kube_tag': 'v1.27.4', 'boot_volume_size': '40', 'boot_volume_type': 'rbd1', 'master_lb_floating_ip_enabled': 'false', 'audit_log_enabled': 'true', 'os_distro': 'ubuntu', 'min_node_count': '1', 'max_node_count': '5'} |
| updated_at | - |
| floating_ip_enabled | True |
| fixed_subnet | - |
| master_flavor_id | m1.medium |
| uuid | e290ab0b-3ab5-4fd2-86d6-8da380e478a4 |
| no_proxy | - |
| https_proxy | - |
| tls_disabled | False |
| keypair_id | - |
| public | False |
| http_proxy | - |
| docker_volume_size | - |
| server_type | vm |
| external_network_id | public |
| cluster_distro | ubuntu |
| image_id | ubuntu2204-tenant-k8s-1.27.11-20240506 |
| volume_driver | - |
| registry_enabled | False |
| docker_storage_driver | overlay2 |
| apiserver_port | - |
| name | test-template-medium-flavor |
| created_at | 2024-06-03T19:42:24+00:00 |
| network_driver | calico |
| fixed_network | - |
| coe | kubernetes |
| flavor_id | m1.medium |
| master_lb_enabled | True |
| dns_nameserver | 1.1.1.1 |
| hidden | False |
| tags | - |
+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
second, look at the cluster template with a large flavor and 60gb boot volume size
openstack coe cluster template show test-template-large-flavor
+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| insecure_registry | - |
| labels | {'kube_tag': 'v1.27.4', 'boot_volume_size': '60', 'boot_volume_type': 'rbd1', 'master_lb_floating_ip_enabled': 'false', 'audit_log_enabled': 'true', 'os_distro': 'ubuntu', 'min_node_count': '1', 'max_node_count': '5'} |
| updated_at | - |
| floating_ip_enabled | True |
| fixed_subnet | - |
| master_flavor_id | m1.large |
| uuid | 0f52c805-7cf8-43b4-bf38-4e2e59274c1a |
| no_proxy | - |
| https_proxy | - |
| tls_disabled | False |
| keypair_id | - |
| public | False |
| http_proxy | - |
| docker_volume_size | - |
| server_type | vm |
| external_network_id | public |
| cluster_distro | ubuntu |
| image_id | ubuntu-2204-kube-v1.27.4 |
| volume_driver | - |
| registry_enabled | False |
| docker_storage_driver | overlay2 |
| apiserver_port | - |
| name | test-template-large-flavor |
| created_at | 2024-06-04T19:39:32+00:00 |
| network_driver | calico |
| fixed_network | - |
| coe | kubernetes |
| flavor_id | m1.large |
| master_lb_enabled | True |
| dns_nameserver | 1.1.1.1 |
| hidden | False |
| tags | - |
+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
finally, upgrade the cluster and look at the values that have been adopted by the cluster.
notice that the cluster references the new template id, but the flavor size is still medium.
openstack coe cluster show test-cluster
+----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| status | UPDATE_COMPLETE |
| health_status | HEALTHY |
| cluster_template_id | 0f52c805-7cf8-43b4-bf38-4e2e59274c1a |
| node_addresses | [] |
| uuid | eca19ace-e4b8-4f0e-bb73-36039965e850 |
| stack_id | kube-cwpqf |
| status_reason | None |
| created_at | 2024-06-04T13:37:15+00:00 |
| updated_at | 2024-06-05T17:47:42+00:00 |
| coe_version | v1.27.4 |
| labels | {'kube_tag': 'v1.27.4', 'boot_volume_size': '40', 'boot_volume_type': 'rbd1', 'master_lb_floating_ip_enabled': 'false', 'audit_log_enabled': 'true', 'os_distro': 'ubuntu', 'min_node_count': '1', 'max_node_count': '5', 'manila_csi_share_network_id': '94d53598-2241-4b12-b46d-056fa090a7a4', 'auto_healing_enabled': 'True', 'auto_scaling_enabled': 'True'} |
| labels_overridden | {'boot_volume_size': '60'} |
| labels_skipped | {} |
| labels_added | {'manila_csi_share_network_id': '94d53598-2241-4b12-b46d-056fa090a7a4', 'auto_healing_enabled': 'True', 'auto_scaling_enabled': 'True'} |
| fixed_network | test-network |
| fixed_subnet | None |
| floating_ip_enabled | False |
| faults | |
| keypair | svc_account |
| api_address | https://10.10.10.144:6443 |
| master_addresses | [] |
| master_lb_enabled | True |
| create_timeout | 60 |
| node_count | 1 |
| discovery_url | None |
| docker_volume_size | None |
| master_count | 3 |
| container_version | None |
| name | test-cluster |
| master_flavor_id | m1.medium |
| flavor_id | m1.medium |
| health_status_reason | {'kube-cwpqf-default-worker-mbrwk-ndshj-hpdfc.Ready': 'True', 'kube-cwpqf-tn6xc-czzmp.Ready': 'True', 'kube-cwpqf-tn6xc-ft52j.Ready': 'True', 'kube-cwpqf-tn6xc-wr7rr.Ready': 'True'} |
| project_id | 8cdcda55818b40c681b03132bbf3a6bc |
+----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Additionally, when viewing the actual storage on the node, only 40gb are available after the upgrade - not 60gb as configured in the new template
ubuntu@kube-cwpqf-control-plane-sjx56-lv777:~$ df -h
Filesystem Size Used Avail Use% Mounted on
tmpfs 392M 3.5M 388M 1% /run
/dev/vda1 40G 7.9G 30G 22% /
...
I think there is more to this issue unfortunately. I think one of the concerns I just thought of as I'm writing this fix is that when you create a cluster, you can specify a master_flavor_id
and/or flavor_id
which is optional (if not set, copied from cluster template).
By forcing the cluster template to always go to the values there, it is possible that someone who created a cluster with a specific flavor_id
or master_flavor_id
see their cluster get resized down or up without them expecting or wanting that change.
It seems for us to be able to do this, we need to allow those two attributes as updatable in the Magnum API, and then we can handle it inside the update request for the driver (and not upgrade).
I hope that this explanation makes sense as to why we can't handle this right now.