chime/terraform-aws-alternat

Request for Enhancements: Improved Monitoring for AlterNAT Routing Transitions

RamazanKara opened this issue · 2 comments

Our team had been using alterNAT for some time and, due to its seamless transition to NAT Gateway, we did not immediately recognize that it had switched the route from NAT Instance to NAT Gateway (potentially an external issue). As a result, we incurred unexpected expenses.

To address this concern and enhance our monitoring capabilities, we would like to propose a new feature: the ability to receive alerts whenever any routing entry is replaced, preferably through an SNS Topic.

By implementing this feature, we can proactively monitor routing transitions and promptly respond to any changes, ensuring that we optimize our usage of resources and minimize unnecessary costs. We understand the value of this feature for our team and believe it could benefit other users as well.

We are enthusiastic about contributing to this enhancement and would be more than willing to create a pull request to implement the suggested feature if it aligns with your development roadmap.

Sounds reasonable to me!

FWIW, we currently monitor for failovers in two ways:

  • Logs events matching "Route replacement succeeded"
  • CloudWatch metric NAT Gateway Bytes In is greater than 0

This has worked fine for us, but I can appreciate the desire for a more direct alert.

A couple of implementation thoughts:

  1. This functionality should be optional and off by default.
  2. Should probably add an SNS VPC endpoint (also optional) in the Terraform module in addition to the SNS topic and any other relevant resources/configuration.
  3. Maybe it goes without saying, but the alert should only fire after the route replacement has definitely succeeded.

Thanks for being willing to add this feature!

Closing this since it hasn't been active in a while. Do feel free to submit a PR if you'd like to see this enhancement!