Bug: FleetsMember is unable to handle migration to another Fleet Manager.
daftping opened this issue · 1 comments
Version of Azure Service Operator
mcr.microsoft.com/k8s/azureserviceoperator:v2.8.0
Describe the bug
FleetsMember
is unable to handle migration to another Fleet Manager.
When FleetsMember.spec.owner.armId
field is updated, the resource is stuck in ClusterAlreadyJoinedAnotherFleet
state.
To Reproduce
Create FleetsMember
resource joining a FleetManager
Update spec.owner.armId
field to point to another FleetManager
Expected behavior
Resource should handle cluster migration to another Fleet Manager (unjoin old Fleet Manager, join new Fleet Manager), if it is not possible, relevant fields should be immutable.
kubectl describe fleetsmember.containerservice.azure.com/cluster-name-b
Name: cluster-name-b
Namespace: default
Labels: <none>
Annotations: serviceoperator.azure.com/credential-from: aso-credentials
serviceoperator.azure.com/latest-reconciled-generation: 4
serviceoperator.azure.com/operator-namespace: capz-system
serviceoperator.azure.com/resource-id:
/subscriptions/<reducted>/resourceGroups/alex-fleet-a/providers/Microsoft.containerservice/fleets/alex-a/members...
API Version: containerservice.azure.com/v1api20230315preview
Kind: FleetsMember
Metadata:
Creation Timestamp: 2024-08-02T19:51:08Z
Finalizers:
serviceoperator.azure.com/finalizer
Generation: 4
Owner References:
API Version: infrastructure.cluster.x-k8s.io/v1alpha1
Block Owner Deletion: true
Controller: true
Kind: AzureASOManagedControlPlane
Name: cluster-name-b
UID: 99941c17-8884-4106-84de-02e22b72bfbf
Resource Version: 4235112
UID: d8b72164-d059-488a-a4d8-719ff32130a1
Spec:
Azure Name: cluster-name-b
Cluster Resource Reference:
Group: containerservice.azure.com
Kind: ManagedCluster
Name: cluster-name-b
Group: default
Owner:
Arm Id: /subscriptions/<reducted>/resourceGroups/alex-fleet-b/providers/Microsoft.ContainerService/fleets/alex-fleet-b
Status:
Cluster Resource Id: /subscriptions/<reducted>/resourceGroups/cluster-name-b/providers/Microsoft.ContainerService/managedClusters/cluster-name-b
Conditions:
Last Transition Time: 2024-08-06T18:26:32Z
Message: One cluster can only join one fleet. The given cluster /subscriptions/<reducted>/resourceGroups/cluster-name-b/providers/Microsoft.ContainerService/managedClusters/cluster-name-b already joined another fleet /subscriptions/<reducted>/resourceGroups/alex-fleet-a/providers/Microsoft.ContainerService/fleets/alex-a. If you want to move the cluster to this fleet, leave the other fleet and try again. Resource ID: "/subscriptions/<reducted>/resourceGroups/alex-fleet-b/providers/Microsoft.ContainerService/fleets/alex-fleet-b/members/cluster-name-b". Correlation ID: "8800d607-34e8-4d4b-b5e5-2db5a2aa8566". Operation ID: "417decdf-fa59-4e04-b701-f1c87e398998": PUT https://management.azure.com/subscriptions/<reducted>/resourceGroups/alex-fleet-b/providers/Microsoft.ContainerService/fleets/alex-fleet-b/members/cluster-name-b
--------------------------------------------------------------------------------
RESPONSE 400: 400 Bad Request
ERROR CODE: ClusterAlreadyJoinedAnotherFleet
--------------------------------------------------------------------------------
{
"error": {
"code": "ClusterAlreadyJoinedAnotherFleet",
"message": "One cluster can only join one fleet. The given cluster /subscriptions/<reducted>/resourceGroups/cluster-name-b/providers/Microsoft.ContainerService/managedClusters/cluster-name-b already joined another fleet /subscriptions/<reducted>/resourceGroups/alex-fleet-a/providers/Microsoft.ContainerService/fleets/alex-a. If you want to move the cluster to this fleet, leave the other fleet and try again. Resource ID: \"/subscriptions/<reducted>/resourceGroups/alex-fleet-b/providers/Microsoft.ContainerService/fleets/alex-fleet-b/members/cluster-name-b\". Correlation ID: \"8800d607-34e8-4d4b-b5e5-2db5a2aa8566\". Operation ID: \"417decdf-fa59-4e04-b701-f1c87e398998\""
}
}
--------------------------------------------------------------------------------
Observed Generation: 4
Reason: ClusterAlreadyJoinedAnotherFleet
Severity: Error
Status: False
Type: Ready
E Tag: "0200afc4-0000-4d00-0000-66b2689b0000"
Group: default
Id: /subscriptions/<reducted>/resourceGroups/alex-fleet-a/providers/Microsoft.ContainerService/fleets/alex-a/members/cluster-name-b
Name: cluster-name-b
Provisioning State: Succeeded
System Data:
Created At: 2024-08-06T18:16:52.2228076Z
Created By: 98828e3b-4b1e-4fc7-850e-faf7f197dc15
Created By Type: Application
Last Modified At: 2024-08-06T18:16:52.2228076Z
Last Modified By: 98828e3b-4b1e-4fc7-850e-faf7f197dc15
Last Modified By Type: Application
Type: Microsoft.ContainerService/fleets/members
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal BeginCreateOrUpdate 13m (x4 over 178m) FleetsMemberController Successfully sent resource to Azure with ID "/subscriptions/<reducted>/resourceGroups/alex-fleet-a/providers/Microsoft.containerservice/fleets/alex-a/members/cluster-name-b"
Normal CredentialFrom 3m35s (x17 over 178m) FleetsMemberController Using credential from "default/aso-credentials"
Warning CreateOrUpdateActionError 3m33s FleetsMemberController Reason: ClusterAlreadyJoinedAnotherFleet, Severity: Error, RetryClassification: RetrySlow, Cause: One cluster can only join one fleet. The given cluster /subscriptions/<reducted>/resourceGroups/cluster-name-b/providers/Microsoft.ContainerService/managedClusters/cluster-name-b already joined another fleet /subscriptions/<reducted>/resourceGroups/alex-fleet-a/providers/Microsoft.ContainerService/fleets/alex-a. If you want to move the cluster to this fleet, leave the other fleet and try again. Resource ID: "/subscriptions/<reducted>/resourceGroups/alex-fleet-b/providers/Microsoft.ContainerService/fleets/alex-fleet-b/members/cluster-name-b". Correlation ID: "8800d607-34e8-4d4b-b5e5-2db5a2aa8566". Operation ID: "417decdf-fa59-4e04-b701-f1c87e398998": PUT https://management.azure.com/subscriptions/<reducted>/resourceGroups/alex-fleet-b/providers/Microsoft.ContainerService/fleets/alex-fleet-b/members/cluster-name-b
--------------------------------------------------------------------------------
RESPONSE 400: 400 Bad Request
ERROR CODE: ClusterAlreadyJoinedAnotherFleet
--------------------------------------------------------------------------------
{
"error": {
"code": "ClusterAlreadyJoinedAnotherFleet",
"message": "One cluster can only join one fleet. The given cluster /subscriptions/<reducted>/resourceGroups/cluster-name-b/providers/Microsoft.ContainerService/managedClusters/cluster-name-b already joined another fleet /subscriptions/<reducted>/resourceGroups/alex-fleet-a/providers/Microsoft.ContainerService/fleets/alex-a. If you want to move the cluster to this fleet, leave the other fleet and try again. Resource ID: \"/subscriptions/<reducted>/resourceGroups/alex-fleet-b/providers/Microsoft.ContainerService/fleets/alex-fleet-b/members/cluster-name-b\". Correlation ID: \"8800d607-34e8-4d4b-b5e5-2db5a2aa8566\". Operation ID: \"417decdf-fa59-4e04-b701-f1c87e398998\""
}
}
I believe another user brought this same problem up in another context in Slack a few days ago.
We should block mutating of owner.armId once it's set, but we don't. Agree it's a bug.
ASO currently doesn't support migrating resources between owners (indeed, there are many places where Azure doesn't support such a thing).
What mutating armId is currently doing is:
- Leave the old
Microsoft.ContainerService/fleets/members
at path/subscriptions/<reducted>/resourceGroups/alex-fleet-a/providers/Microsoft.ContainerService/fleets/alex-fleet-b/members/cluster-name-b
. - Create a new
Microsoft.ContainerService/fleets/members
at path/subscriptions/<reducted>/resourceGroups/alex-fleet-b/providers/Microsoft.ContainerService/fleets/alex-fleet-b/members/cluster-name-b
.
Fleet then interprets this as "you are trying to join the same cluster to 2 fleets", because there are two fleets/members
resources for the same cluster. ASO doesn't auto-delete the old resource because (in the case of things like databases) that would be very bad. Deletion should be explicit from the user.
If we fix this bug and make owner.armId immutable, the expected experience to perform this action would become:
- Delete old fleetMember
- Create new fleetMember
Thanks for the report, we'll try to get a fix in for the next release.