aws-solutions/network-orchestration-for-aws-transit-gateway

Update v2.0.0 to v3.3.3 fails at spoke stack deployment.

Closed this issue · 3 comments

Describe the bug

When updating the spoke stackset to v3.3.3, all instances complete with:

Update successful. One or more resources could not be deleted.

CustomServiceLinkedRole DELETE_FAILED

Resource of type 'AWS::IAM::ServiceLinkedRole' with identifier 'AWSServiceRoleForVPCTransitGateway' has a conflict. Reason: SLR [AWSServiceRoleForVPCTransitGateway] is in use by other resources: [[RoleUsageType(Region=us-west-2, Resources=[tgw-attach-xxxxxxxxxxxxxxxxx])]].

To Reproduce

  1. (success) Update the Hub Stack to v3.3.3
  2. (success) Create the service-linked role hub stack.
  3. (the "bug") Update the spoke stacks to v3.3.3
  4. (not attempted) Create the service-linked role spoke stacks.

Expected behavior

For customers where the service linked role was created by the spoke stack in v2.0.0, I can't really see how this update process should succeed. The upgrade tries to delete the role and if you're using the solution and have attachments you can't do this. But that was how I interpreted the documentation.

Resolution

Fortunately, upgrading the spoke stacks completed successfully and removed the role's Cloudformation resource, leaving the physical resource behind. So for step 4 when deploying the standalone spoke stack, I used cloudformation's resource import feature to bring the orhpaned roles back into STNO. You have to modify the template to add a DeletionPolicy and also remove the Outputs section but once you have finished you can update the stacks again with the original templates without issue.

Although I have resolved my own issue I will leave this bug open in case a maintainer wishes to confirm this behavior or offer a better solution for others.

Please complete the following information about the solution:

  • Version: 2.0.0 -> 3.3.3

To get the version of the solution, you can look at the description of the created CloudFormation stack. For example, "(SO0009) - The AWS CloudFormation template for deployment of the aws-centralized-logging. Version v1.0.0". You can also find the version from releases

  • Region: [e.g. us-west-2]
  • Was the solution modified from the version published on this repository? No
  • [-] If the answer to the previous question was yes, are the changes available on GitHub?
  • Have you checked your service quotas for the sevices this solution uses?
  • Were there any errors in the CloudWatch Logs? How to enable debug mode?

Screenshots

N/A

Additional context
Add any other context about the problem here.

Hello @mancinifm,
This is not a bug but expected behavior as you explained caused due to the failure to delete the resource as there is an existing attachment in the region. The reason we didn't implement the deletion policy and instead delete the resource from the CloudFormation stack to avoid this error for new CFN deployments.
We even tried to use the CustomSuffix to streamline the experience but TGW SLR does not support custom suffix property yet.

Duplicate of Issue #96

Thank you for your response. The linked issue explains the situation so I will close this issue.