hashicorp/terraform

API not ready when Terraform completes executing

betabandido opened this issue · 5 comments

I am using Terraform 0.9.6 to create an API using API gateway.

I am getting intermittent failures where the API is not ready when Terraform completes the apply step. Once the apply step finishes, I run terraform output <out-var> to obtain the URL of the created API. When I try to access the URL I get the following error:

{"message": "Internal server error"}

When I try to do it again in a couple of minutes, then it works.

I am manually creating the following dependencies:

  • aws_api_gateway_deployment -> aws_api_gateway_integration_response
  • aws_api_gateway_integration_response -> aws_api_gateway_integration

Does Terraform guarantee that all the created resources will be fully created (and accessible) once it finishes its execution? Might this be a bug? Due to the intermittent nature of the problem, it is not easy to create a reproducible example, but I would be glad to help to diagnose this issue.

Hi @betabandido! Thanks for recording this issue.

It is definitely the goal for Terraform to wait for a resource to be totally ready before considering the apply complete. Unfortunately, in practice there are often situations where the underlying API is only eventually consistent, and so it's not possible to reliably detect that the change has propagated successfully.

I expect API gateway is one of these situations, since it uses Cloudfront as its frontend and Cloudfront is a big, geo-distributed system that tends to take some time (several minutes, usually) to "settle" after changes. When interacting with Cloudfront's API directly Terraform is able to monitor the completion of the change, but unfortunately with API Gateway such details tend to be hidden as the Cloudfront interaction is an implementation detail of API Gateway.

So this then brings us to a question of whether there's anything we can reasonably do to get closer to our ideal goal here. One thing we could try is to repeatedly run GET requests on the root API URL until a successful response is returned, but I don't think that can work because it's possible that the root resource might be intentionally configured to return an error, and may have expensive side-effects that would be undesirable, such as running an AWS Lambda function.

In the absence of this, I think the best that could be done is to use a provisioner "local-exec" to run an external script that blocks until the API seems available. That way this script can use some application-specific knowledge to use an API endpoint that is known to return an OK response and not have any expensive side-effects. If that seems to work, we could potentially adopt it as an official feature that is activated by setting an optional attribute on the API gateway deployment for the path to use to poll for activation.

Well, this explains it all. It is, however, unfortunate that the AWS SDK is not strictly consistent.

I completely agree that polling the root of the API URL shouldn't be directly done by Terraform. Instead, adding an optional attribute for users to provide a safe endpoint that Terraform can poll seems a good idea to me.

I will do some research on how using an approach such as exponential backoff polling works, and I will add my results here.

Hi both, this explains a few things. I'm currently working with an API Gateway setup and then putting a cloudfront instance in front of that.

Terraform is quickly firing through the setup of API Gateway (where deployment is still running), when it gets to the Cloudfront resource I get the error:
NoSuchOrigin: One or more of your origins do not exist.

This makes sense as it doesn't, but really if we're going to use Terraform we need a way to wait until it's done.

Hi @heylookalive,

First, please note that this issue has been migrated to the aws provider's separate repository now, so it's likely that we won't be paying as much attention over here. Sorry for all the disruption while we migrate to this separate-repository model; hopefully over time we'll work through this weirdness as issues will be opened in the other repositories first.

With that said: it is my understanding that all API gateway APIs are behind Cloudfront by default, although that detail is hidden and so you cannot directly customize settings for the implicitly-created Cloudfront distribution. I've never tried actually placing another, directly-created distribution in front of the API. I assume what's going on here is that the DNS record for the API hasn't propagated yet by the time the Cloudfront distribution is created, and so it appears that the API gateway endpoint is not working.

If that is indeed true, then the solution of polling for the API to be ready ought to fix it. You can do this today using a local-exec provisioner with a hand-written polling script. If you or someone else is willing to give that a try and see how it works, that experience will be invaluable in figuring out the best way to do this as a first-class feature. If you try it, please let us know how it worked out in hashicorp/terraform-provider-aws#817.

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.