[Bug]: LKE k8s version upgrade returns error every time
Aransh opened this issue · 2 comments
Terraform Version
Terraform v1.5.7 on darwin_amd64
Linode Provider Version
v2.9.3
Effected Terraform Resources
linode_lke_cluster
Terraform Config Files
resource "linode_lke_cluster" "k8s-cluster" {
k8s_version = "1.26"
label = "xxxx"
region = "us-iad"
tags = ["dev"]
control_plane {
high_availability = true
}
pool {
count = 3
type = "g6-standard-4"
autoscaler {
max = 5
min = 3
}
}
}
Debug Output
No response
Panic Output
No response
Expected Behavior
Kubernetes version is upgraded, and terraform apply finishes right after with no error code
Actual Behavior
Been able to reproduce with 4 different clusters already,
When upgrading k8s 1.26->1.27, cluster upgrade is done after a few minutes (I follow it by running "watch kubectl get nodes" until I see all nodes are on 1.27, as long as the cluster is not too big it'll finish before terraform times out), but terraform keeps going, with it being unclear what it's actually doing at this point as it seems the upgrade is long done (last upgrade I did, all nodes were on 1.27 10 minutes into the upgrade, but terraform kept going for 30 more minutes until it timed out).
Once terraform finally times out, it spits out this error:
│ Error: failed to wait for all LKE Cluster (113537) nodes to start recycle: [002] Get "https://api.linode.com/v4/account/events?page=1": context deadline exceeded
│
│ with module.csi_cluster.linode_lke_cluster.k8s-cluster,
│ on .terraform/modules/csi_cluster/csi_kubernetes_cluster/main.tf line 6, in resource "linode_lke_cluster" "k8s-cluster":
│ 6: resource "linode_lke_cluster" "k8s-cluster" {
│
But running terraform apply again after that, it looks all is well, and the upgrade went successfully, so clearly something is broken during the 30 minutes when terraform keeps going...
Steps to Reproduce
- Create lke cluster with k8s 1.26 using terraform
- Upgrade the cluster to k8s 1.27 using terraform
@lgarber-akamai Sounds great, following this PR, thanks