awslabs/aws-service-catalog-puppet

Question: Strategy for multi-region with multiple accounts

Closed this issue · 9 comments

  • We currently have ~200 accounts all deployed to a single region utilising OU paths in our manifest.
  • We would like to enable other regions. How to do this with Factory and for the launches is clearly documented.
  • However we don't want to enable every account for every region (and therefore deploy products everywhere), only those that need it.
  • An approach for this is also clearly documented (specify the accounts with extra regions separately in the manifest) but it feels like this may become unmaintainable once we hit 15-20 accounts that may all have different required regions.
  • We would also like to automate the process so our Account Vending Machine can select between regions and we can create a Self Service product allowing teams to enable a region for their account.

Is there a strategy documented anywhere which suggests how to approach this situation (a maintainable manifest file that allows enabled_regions per account to be dynamic)?

We have considered

  1. Adding accounts separately to the manifest as and when they require additional/alternative regions - Simple but potentially difficult to maintain at scale and to automate.
  2. Using a separately deployed product that puppet needs to see in a region before deploying there - Potentially possible with a lambda-invocation depends_on but will result in an out-of-band product deployment and not sure if check would really be account and region specific.
  3. Deploying every product to every account in every enabled region regardless of the need. - Manifest stays pretty much the same but we have ~$60 cost per account per month ($12000 monthly for 200 accounts) that could be avoided.
  4. Using assertions with an ssm parameter per region per account. Have the root product/launch depend on the assertion so that products are only launched in a region in an account if the assertion is true for that account and region - Similar issues to 2 but feels slightly more native puppet functionality. Again unsure if we would be misusing assertions and also unsure if the affinity of assertion could be scoped to just the one account and region puppet is about to deploy into.
  5. Redesigning our manifest to make option 1 work at scale - This would be acceptable but we're not sure what approach would make option 1 more maintainable.
  6. Asking on GitHub whether anyone has a better strategy. Hi! 👋
  1. Using AWS Organization account tags with tags like "ap-southeast-1: enabled" for specific accounts, use organizations_account_tags: append and updating every launch to have a relevant deploy_to where the ap-southeast-1 tag deploys to the ap-southeast-1 region - If deploy_to works in this way then this may be our best option, we do have to update every launch anytime a region is "globally" enabled and automate the tagging in AWS Organizations. Would be interested in other's opinions on this approach though :)

thanks for reaching out with this request. I think the current mechanisms for dealing with variance do not cater for this use case without adding an unacceptable level of operational pain. If a new feature was released to make this possible when would you need it for?

Our current target is beginning of August. However it won't need to be immediately scalable, we could use one of the options above (probably option 1 as simplest and supported) for another few weeks past the August target if there was a feature arriving which took care of the operational overhead that we could switch to later.

As of release https://github.com/awslabs/aws-service-catalog-puppet/releases/tag/0.233.0 you can now make use of a new construct named external_account_overrides. It allows you to say the default region and or enabled_regions are available from ssm within the home region of the puppet account.

You can specify the following:

accounts:
  - ou: "/dev"
    name: "dev-accounts"
    external_account_overrides:
      default_region:
        source: ssm
      enabled_regions:
        source: ssm
    tags:
      - "type:dev"

When the puppet manifest is expanded the values will be retrieved from ssm parameter store parameters with the naming convention of: /servicecatalog-puppet/manifest-external-account-overrides/<account_id>/default_region or /servicecatalog-puppet/manifest-external-account-overrides/<account_id>/enabled_regions where <account_id> is the account within the /dev ou.

if you do not wish to provide default_regions or regions_enabled for all accounts you can set fail_if_missing: false on the option:

accounts:
  - ou: "/dev"
    name: "dev-accounts"
    external_account_overrides:
      default_region:
        source: ssm
        fail_if_missing: false
      enabled_regions:
        source: ssm
        fail_if_missing: false
    tags:
      - "type:dev"

This means the absence of account default regions and regions enabled will be a silent error and may be tricky to spot. If you are unsure what is happening please check the expanded manifest for the result of the lookups.

And here I was thinking that "beginning of August" was a stretch when in reality I should have said "beginning of tomorrow" 😀

One question - so long as there is an ssm parameter for every account for default_region, if ssm is set for both and fail_if_missing set to false for the enabled regions, is it valid to only have enabled_regions ssm parameters (which I assume should be a comma separated string btw?) for a few accounts? i.e. this is a happy path?

Many thanks @eamonnfaherty, will get on with the upgrade and try it out tomorrow and feedback.

The default_region and regions_enabled are independent of each other. You can set either or both or neither if you so wish.

For any you set, if you fail to set fail_if_missing then the solution expects to have a value for every account and will fail if any are missing.

regions_enabled must be a yaml encoded string.

Did this work out okay for you? If so please close this issue.

Oh apologies - we're having some unconnected issues in upgrading from our current version. I'm happy for you to close now as the described functionality should meet the requirements - otherwise we will close once upgraded and tested.

Np. I will close this as the feature shipped. If you hit any issues please raise a new issue.