'vmssCSE' resource still there after kamino deletes it
Closed this issue · 5 comments
From a kamino job run, it appears that the vmssCSE
resource was deleted as we would expect:
2021-09-22T00:49:02.822805180Z k8s-agentpool2-24293783-vmss INFO: ===> Executing command: ['az' 'vmss' 'extension' 'delete' '--subscription' '***' '--resource-group' 'kubernetes-eastus2-75998' '--vmss-name' 'k8s-agentpool2-24293783-vmss' '--name' 'vmssCSE']
But a scale out after running the above vmss-prototype
job shows that the vmssCSE
resource is still present on nodes:
$ az vmss list-instances -g kubernetes-eastus2-75998 -n k8s-agentpool2-24293783-vmss | jq -r '.[] | select(.name == "k8s-agentpool2-24293783-vmss_969") | .resources'
[
{
"autoUpgradeMinorVersion": true,
"enableAutomaticUpgrade": null,
"forceUpdateTag": null,
"id": "/subscriptions/01993b85-2035-47d7-99ad-a70e8520136f/resourceGroups/kubernetes-eastus2-75998/providers/Microsoft.Compute/virtualMachines/k8s-agentpool2-24293783-vmss_969/extensions/vmssCSE",
"instanceView": null,
"location": "eastus2",
"name": "vmssCSE",
"protectedSettings": null,
"provisioningState": "Failed",
"publisher": "Microsoft.Azure.Extensions",
"resourceGroup": "kubernetes-eastus2-75998",
"settings": {},
"tags": null,
"type": "Microsoft.Compute/virtualMachines/extensions",
"typeHandlerVersion": "2.0",
"typePropertiesType": "CustomScript"
},
{
"autoUpgradeMinorVersion": true,
"enableAutomaticUpgrade": null,
"forceUpdateTag": null,
"id": "/subscriptions/01993b85-2035-47d7-99ad-a70e8520136f/resourceGroups/kubernetes-eastus2-75998/providers/Microsoft.Compute/virtualMachines/k8s-agentpool2-24293783-vmss_969/extensions/k8s-agentpool2-24293783-vmss-computeAksLinuxBilling",
"instanceView": null,
"location": "eastus2",
"name": "k8s-agentpool2-24293783-vmss-computeAksLinuxBilling",
"protectedSettings": null,
"provisioningState": "Failed",
"publisher": "Microsoft.AKS",
"resourceGroup": "kubernetes-eastus2-75998",
"settings": {},
"tags": null,
"type": "Microsoft.Compute/virtualMachines/extensions",
"typeHandlerVersion": "1.0",
"typePropertiesType": "Compute.AKS-Engine.Linux.Billing"
}
]
(As you can see above the provisioningState
of the CSE if Failed
, which is how I found this.)
Is this a bug in kamino (not retrying after failed extension delete?) or in VMSS?
From this automated run:
Interesting - based on the logs, azure claimed the extension was correctly deleted. (It would have retried up to 3 times on failure to delete)
2021-09-22T00:49:00.797508015Z k8s-agentpool2-24293783-vmss INFO: ===> Executing command: ['az' 'vmss' 'show' '--subscription' '***' '--resource-group' 'kubernetes-eastus2-75998' '--name' 'k8s-agentpool2-24293783-vmss']
2021-09-22T00:49:01.900907139Z k8s-agentpool2-24293783-vmss INFO: ===> Executing command: ['az' 'vmss' 'extension' 'list' '--subscription' '***' '--resource-group' 'kubernetes-eastus2-75998' '--vmss-name' 'k8s-agentpool2-24293783-vmss']
2021-09-22T00:49:02.822805180Z k8s-agentpool2-24293783-vmss INFO: ===> Executing command: ['az' 'vmss' 'extension' 'delete' '--subscription' '***' '--resource-group' 'kubernetes-eastus2-75998' '--vmss-name' 'k8s-agentpool2-24293783-vmss' '--name' 'vmssCSE']
2021-09-22T00:49:15.386878576Z k8s-agentpool2-24293783-vmss INFO: ===> Executing command: ['az' 'vmss' 'show' '--subscription' '***' '--resource-group' 'kubernetes-eastus2-75998' '--name' 'k8s-agentpool2-24293783-vmss']
2021-09-22T00:49:16.531976599Z k8s-agentpool2-24293783-vmss INFO: ===> Executing command: ['az' 'vmss' 'update' '--subscription' '***' '--resource-group' 'kubernetes-eastus2-75998' '--name' 'k8s-agentpool2-24293783-vmss' '--set' 'sku.capacity=1000' '--no-wait']
We could add a loop to list extensions again and check if we need to delete again, to validate that they really have been deleted.
Yep, I think so. :/
It is sad since the azure CLI claimed it worked :-(
I have not noticed this happening before but it would not be hard to add that.