cloudfoundry-community/terraform-provider-cloudfoundry

RateLimitExceeded: bluegreen-strategy-v3 spams CF API during race condition

sleungcy opened this issue · 1 comments

Summary

Since our organization adopted the bluegreen-strategy-v3 on the latest provider version, we are frequently getting a "RateLimitExceeded" error.

Upon inspection of the access_logs on the CF side, I noticed that there are frequently spam to the /v3/apps/:guid/processes endpoint.

Chart

Screenshot 2023-08-04 at 1 58 22 PM

Observation

The goroutine goes rogue whenever an application took more than 20 seconds to stop

Investigation

There is a race condition causing a goroutine to run indefinitely.

Within the goroutine here:

The go routine infinite loop is expected to exit gracefully upon isAppStopped returns true. However, if isAppStopped receive any non-200 status from the API, it returns (false, err). However, the error is not handled within the go routine and it retries on an error.

As can be seen in the diagram above, the routine continue to retry even when the application is already destroyed. Normally, the application wouldn't be destroyed at this stage, but because of the timeout channel, the deploy process moved onto the next stage and deleted the app.

channelIsStopped := make(chan bool, 1)
channelError := make(chan error, 1)
var err error

go func() {
	isStopped := false
	for !isStopped {
		isStopped, err = isAppStopped(s.client, appDeploy.App.GUID)
		time.Sleep(delayBetweenRequests * time.Second)
	}
	channelIsStopped <- isStopped
	channelError <- err
}()

select {
case <-channelIsStopped:
case <-time.After(stopAppTimeout * time.Second):
	// App is not in expected state (stopped) after waiting for the timeout
	log.Print("Timeout reached while waiting for application to stop.")
case <-channelError:
	return ctx, err
}
loafoe commented

v0.51.3 has this fix