jackfrancis/kamino

'vmssCSE' resource still there after kamino deletes it

Closed this issue · 5 comments

From a kamino job run, it appears that the vmssCSE resource was deleted as we would expect:

2021-09-22T00:49:02.822805180Z k8s-agentpool2-24293783-vmss	INFO: ===> Executing command: ['az' 'vmss' 'extension' 'delete' '--subscription' '***' '--resource-group' 'kubernetes-eastus2-75998' '--vmss-name' 'k8s-agentpool2-24293783-vmss' '--name' 'vmssCSE']

But a scale out after running the above vmss-prototype job shows that the vmssCSE resource is still present on nodes:

$ az vmss list-instances -g kubernetes-eastus2-75998 -n k8s-agentpool2-24293783-vmss | jq -r '.[] | select(.name == "k8s-agentpool2-24293783-vmss_969") | .resources'
[
  {
    "autoUpgradeMinorVersion": true,
    "enableAutomaticUpgrade": null,
    "forceUpdateTag": null,
    "id": "/subscriptions/01993b85-2035-47d7-99ad-a70e8520136f/resourceGroups/kubernetes-eastus2-75998/providers/Microsoft.Compute/virtualMachines/k8s-agentpool2-24293783-vmss_969/extensions/vmssCSE",
    "instanceView": null,
    "location": "eastus2",
    "name": "vmssCSE",
    "protectedSettings": null,
    "provisioningState": "Failed",
    "publisher": "Microsoft.Azure.Extensions",
    "resourceGroup": "kubernetes-eastus2-75998",
    "settings": {},
    "tags": null,
    "type": "Microsoft.Compute/virtualMachines/extensions",
    "typeHandlerVersion": "2.0",
    "typePropertiesType": "CustomScript"
  },
  {
    "autoUpgradeMinorVersion": true,
    "enableAutomaticUpgrade": null,
    "forceUpdateTag": null,
    "id": "/subscriptions/01993b85-2035-47d7-99ad-a70e8520136f/resourceGroups/kubernetes-eastus2-75998/providers/Microsoft.Compute/virtualMachines/k8s-agentpool2-24293783-vmss_969/extensions/k8s-agentpool2-24293783-vmss-computeAksLinuxBilling",
    "instanceView": null,
    "location": "eastus2",
    "name": "k8s-agentpool2-24293783-vmss-computeAksLinuxBilling",
    "protectedSettings": null,
    "provisioningState": "Failed",
    "publisher": "Microsoft.AKS",
    "resourceGroup": "kubernetes-eastus2-75998",
    "settings": {},
    "tags": null,
    "type": "Microsoft.Compute/virtualMachines/extensions",
    "typeHandlerVersion": "1.0",
    "typePropertiesType": "Compute.AKS-Engine.Linux.Billing"
  }
]

(As you can see above the provisioningState of the CSE if Failed, which is how I found this.)

Is this a bug in kamino (not retrying after failed extension delete?) or in VMSS?

Interesting - based on the logs, azure claimed the extension was correctly deleted. (It would have retried up to 3 times on failure to delete)

2021-09-22T00:49:00.797508015Z k8s-agentpool2-24293783-vmss	INFO: ===> Executing command: ['az' 'vmss' 'show' '--subscription' '***' '--resource-group' 'kubernetes-eastus2-75998' '--name' 'k8s-agentpool2-24293783-vmss']
2021-09-22T00:49:01.900907139Z k8s-agentpool2-24293783-vmss	INFO: ===> Executing command: ['az' 'vmss' 'extension' 'list' '--subscription' '***' '--resource-group' 'kubernetes-eastus2-75998' '--vmss-name' 'k8s-agentpool2-24293783-vmss']
2021-09-22T00:49:02.822805180Z k8s-agentpool2-24293783-vmss	INFO: ===> Executing command: ['az' 'vmss' 'extension' 'delete' '--subscription' '***' '--resource-group' 'kubernetes-eastus2-75998' '--vmss-name' 'k8s-agentpool2-24293783-vmss' '--name' 'vmssCSE']
2021-09-22T00:49:15.386878576Z k8s-agentpool2-24293783-vmss	INFO: ===> Executing command: ['az' 'vmss' 'show' '--subscription' '***' '--resource-group' 'kubernetes-eastus2-75998' '--name' 'k8s-agentpool2-24293783-vmss']
2021-09-22T00:49:16.531976599Z k8s-agentpool2-24293783-vmss	INFO: ===> Executing command: ['az' 'vmss' 'update' '--subscription' '***' '--resource-group' 'kubernetes-eastus2-75998' '--name' 'k8s-agentpool2-24293783-vmss' '--set' 'sku.capacity=1000' '--no-wait']

We could add a loop to list extensions again and check if we need to delete again, to validate that they really have been deleted.

Yep, I think so. :/

It is sad since the azure CLI claimed it worked :-(

I have not noticed this happening before but it would not be hard to add that.