Azure/AKS

[BUG] AKS Using Outdated User-Assigned Identity After Terraform Re-provisioning

Opened this issue · 1 comments

Description

Azure Kubernetes Service (AKS) continues to use an old managed user-assigned identity (UAI) when the identity is re-provisioned via Terraform.

Steps to Reproduce

  1. Use Terraform to create a User-Assigned Identity (UAI) and assign it to an AKS cluster.
  2. Deploy services to the cluster to verify normal operation.
  3. Delete the UAI from the Azure portal.
  4. Re-provision the UAI by running Terraform again.
  5. Attempt to deploy Kubernetes services to the cluster.

Current Behavior

  • Deployment fails with a 403 error.
  • Logs show that the cluster is using the old clientID of the deleted UAI.

Expected Behavior

The AKS cluster should recognize and use the newly provisioned UAI with its new clientID.

Logs

Screenshots

  • The ID of the new UAI
image
  • k8s service after the the UAI is reprovisioned
image

Possible Causes

  • AKS might be caching the UAI information and not updating it when the identity is re-provisioned.
  • There could be a delay in propagating the new UAI information to the AKS cluster.

Workaround

I have to reprovision the whole cluster when this happens.
Update: I found that rename the UAI helps the AKS rotate new UAI too.
image

Impact

This bug prevents the proper functioning of services that rely on the user-assigned identity, potentially causing deployment failures and service disruptions.

Environment (please complete the following information):

  • CLI Version [e.g. 3.22]
  • Kubernetes version: 1.29.7
  • CLI Extension version [e.g. 1.7.5] if applicable
  • Browser [e.g. chrome, safari] is applicable

Additional context
Add any other context about the problem here.

Thanks for open the issue, will try to repro and fix. @norshtein