aws/aws-app-mesh-examples

The howto-cross-account example always fails in step AWS::ServiceDiscovery::PrivateDnsNamespace in CloudFormation

edu-donato opened this issue ยท 6 comments

Describe the bug
When the deploy script start to run the CloudFormation stacks of the secondary account AWS cannot complete the process of creating the infra resources. I always receive the same error in AWS console:

The VPC: <vpc_name> in region <region> that you provided is not authorized to make the association. (Service: AmazonRoute53; Status Code: 400; Error Code: InvalidVPCId; Request ID: <request_id>; Proxy: null)

The problem is that AWS requires a VPC Authorization Association between the shared vpc and the private hosted zone. So the stack should create a Private Hosted Zone, create the Autorization, create the Association and delete the Authorization. As per this comment it seems that this behavior started in february.

This same error happens when using the examples in blogs/ecs-cross-account.

Platform
ECS

To Reproduce
Just deploy the example in how-to-cross-account.

Expected behavior
I expect that there is no error thrown and the infra resources are created.

We're are aware of this issue and we're going to connect with RAM team internally to figure out if there are any workarounds and what's the solution here.

Meanwhile, we'll update our examples to use CloudMap based service discovery instead of DNS based service discovery to make it work for now. Specifically here and here.

However, one caveat with CloudMap based service discovery is that the secondary account needs to have an AppMesh SLR role to query CloudMap. You can workaround this either by manually creating an SLR role in your secondary account or by creating a dummy mesh in the secondary account, which creates the SLR role under the hood.

Update: We create SLRs on creating VirtualNode, so we do not need to create SLR manually.

I have updated the demo to use CloudMap based service discovery with HTTP based namespaces in the secondary account: #434.

We'll keep this issue open until we have a workaround/solution for DNS based service discovery.

While using the example in blogs/ecs-cross-account, please follow this workaround:

  1. Follow the blog till "Set up the frontend and application server in Account Frontend".

  2. In the Create the ECS cluster and infrastructure components step for Account Frontend (specifically refering to this line), the frontend account tries to create a Private DNS Namespace that is using the shared VPC as a parameter reference. This does not currently work. Instead, the Private DNS Namespace needs to be created with a VPC that is local to the Account Frontend.

    This stack creates a Private DNS Namespace in a temporary local VPC. We will associate the Private DNS Namespace with the shared VPC in Account Backend in a later step. :)

  3. Once that stack is created, AWS CloudMap on behalf of the customer, creates a Hosted Zone in the VPC that has been provided (local temporary VPC). To use cross-account features, we don't want to use this VPC and instead we want to use the Shared VPC in Account Backend. In order to do that, we have to:

    First, create a VPC Authorization Association between the Hosted Zone and shared VPC. To do that,  use the following command through the AWS CLI (AWS Console or AWS CFN support is not available for this):

    aws --region <region> route53 --profile frontend create-vpc-association-authorization --hosted-zone-id 
    <frontend-hosted-zone-id> --vpc '{"VPCRegion":"<region>","VPCId": "<shared-vpc-in-account-backend>"}'
    

    After that, we need to associate shared VPC in account backend with the Hosted Zone in account frontend. This can be done through either the AWS Console (of account Backend) or through the AWS CLI:

    aws --region <region> route53 associate-vpc-with-hosted-zone --hosted-zone-id 
    <frontend-hosted-zone-id> --vpc VPCRegion=<region>,VPCId=<shared-vpc-in-account-backend>
    

    At this point, the Private DNS Namespace should be associated with the shared VPC.

  4. (Cleanup) To avoid unnecessary risks, please disassociate the VPC Authorization Association using the AWS CLI (not available in AWS CFN/Console):

     aws --region <region> --profile backend delete-vpc-association-authorization --hosted-zone-id <frontend-hosted-zone-id> --vpc '{"VPCRegion":"<region>","VPCId": "<backend-shared-vpc-id>"}'
    

For the howto-cross-account-example, please use the following workaround (similar to above):

  1. Use the earlier version of infra.yaml in the secondary account with a PrivateDNSNamespace (this is a link to that file) and replace the shared VPC with a temporary VPC local to the secondary account

  2. Now create a VPC Authorization association manually through the CLI between the shared VPC in the primary account and the Hosted Zone (HZ) of the secondary account. (refer to step 3 above)

  3. Now associate the shared VPC with the HZ through the AWS Console or CLI (refer to step 3 above)

  4. Delete the VPC authorization association created in step 2

I'm closing this issue, please feel free to re-open if you encounter issues with any of the above steps!

Blog post for reference: https://aws.amazon.com/pt/blogs/containers/connecting-services-across-multiple-accounts-using-aws-app-mesh-and-amazon-ecs/

I came across this issue using Terraform to deploy an App Mesh with a private hosted zone in a cross-account scenario, where a VPC is owned by an account Account1 and the App Mesh and private hosted zone is owned by another account Account2.

When creating the Terraform resource aws_service_discovery_private_dns_namespace

resource "aws_service_discovery_private_dns_namespace" "example" {
  name        = "example.local"
  description = "Example Service discovery namespace"
  vpc         = local.shared_vpc_id
}

I get the error:

Error: waiting for Service Discovery Private DNS Namespace (example.local) create: unexpected state 'FAIL', wanted target 'SUCCESS'. last error: CANNOT_CREATE_HOSTED_ZONE: The VPC: vpc-xxxxxxxx in region eu-west-1 that you provided is not authorized to make the association. (Service: AmazonRoute53; Status Code: 400; Error Code: InvalidVPCId; Request ID: 81620f06-be0f-44d0-bf78-e2cbd4c8ee8c; Proxy: null)

I was able to solve this error following the workaround proposed by @rishijatia, creating a temporal VPC in Account2, using its VPC id for the aws_service_discovery_private_dns_namespace resource, and authorizing and associating the shared VPC with the created private hosted zone.

Then, I proceeded to delete the temporal VPC in Account2, and everything works fine, but this lets the Terraform configuration of resource aws_service_discovery_private_dns_namespace with an ID of a non-existing VPC, in addition to requiring manual steps each time a new app is deployed in a new private hosted zone.

Is there any comment about this issue from the RAM team? Can we expect this issue to be resolved?

I have also checked the approach of using private hosted zone in Account1 and CloudMap based service discovery with HTTP namespaces in Account2, as proposed by @Y0Username, and this works well for virtual gateway to service communication, but I can't make it work for service to service communication, failing in the DNS resolution.

I thought it could be my fault, but to be honest, I haven't seen any example of this working, as every example I have seen, like this or this, always use a virtual gateway or DNS based service discovery for service to service communication.

Do you have any working example of cross-account service to service communication using CloudMap based service discovery? Is this even possible without DNS based service discovery?

Thank you very much for your support!

I'm not sure why this was closed as it still an issue.
It's so unfortunately that we didn't have any progress on this.
The manual step to create the temporary VPC and manually associate to the shared VPC is so annoying.
Did anyone manage or try to get some workaround to automate this?
And many thanks to rishijatia for the manual workaround. It works for me.

Yes, please reopen and make this functionality work:) It's just a stupid workaround that a temporary VPC needs to be created for the PHZ to exist to create the authorization