linode/terraform-provider-linode

[Bug]: LKE k8s version upgrade returns error every time

Aransh opened this issue · 2 comments

Aransh commented

Terraform Version

Terraform v1.5.7 on darwin_amd64

Linode Provider Version

v2.9.3

Effected Terraform Resources

linode_lke_cluster

Terraform Config Files

resource "linode_lke_cluster" "k8s-cluster" {
    k8s_version   = "1.26"
    label         = "xxxx"
    region        = "us-iad"
    tags          = ["dev"]

    control_plane {
        high_availability = true
    }

    pool {
        count = 3
        type  = "g6-standard-4"

        autoscaler {
            max = 5
            min = 3
        }
    }
}

Debug Output

No response

Panic Output

No response

Expected Behavior

Kubernetes version is upgraded, and terraform apply finishes right after with no error code

Actual Behavior

Been able to reproduce with 4 different clusters already,
When upgrading k8s 1.26->1.27, cluster upgrade is done after a few minutes (I follow it by running "watch kubectl get nodes" until I see all nodes are on 1.27, as long as the cluster is not too big it'll finish before terraform times out), but terraform keeps going, with it being unclear what it's actually doing at this point as it seems the upgrade is long done (last upgrade I did, all nodes were on 1.27 10 minutes into the upgrade, but terraform kept going for 30 more minutes until it timed out).

Once terraform finally times out, it spits out this error:

│ Error: failed to wait for all LKE Cluster (113537) nodes to start recycle: [002] Get "https://api.linode.com/v4/account/events?page=1": context deadline exceeded

│ with module.csi_cluster.linode_lke_cluster.k8s-cluster,
│ on .terraform/modules/csi_cluster/csi_kubernetes_cluster/main.tf line 6, in resource "linode_lke_cluster" "k8s-cluster":
│ 6: resource "linode_lke_cluster" "k8s-cluster" {

But running terraform apply again after that, it looks all is well, and the upgrade went successfully, so clearly something is broken during the 30 minutes when terraform keeps going...

Steps to Reproduce

  1. Create lke cluster with k8s 1.26 using terraform
  2. Upgrade the cluster to k8s 1.27 using terraform

Hey @Aransh, thanks for the report!

I currently have a PR up to fix this issue and we will try to get a patch out shorty.

#1120

Aransh commented

@lgarber-akamai Sounds great, following this PR, thanks