RateLimitExceeded: bluegreen-strategy-v3 spams CF API during race condition
sleungcy opened this issue · 1 comments
Summary
Since our organization adopted the bluegreen-strategy-v3 on the latest provider version, we are frequently getting a "RateLimitExceeded" error.
Upon inspection of the access_logs on the CF side, I noticed that there are frequently spam to the /v3/apps/:guid/processes endpoint.
Chart
Observation
The goroutine goes rogue whenever an application took more than 20 seconds to stop
Investigation
There is a race condition causing a goroutine to run indefinitely.
Within the goroutine here:
The go routine infinite loop is expected to exit gracefully upon isAppStopped
returns true. However, if isAppStopped
receive any non-200 status from the API, it returns (false, err). However, the error is not handled within the go routine and it retries on an error.
As can be seen in the diagram above, the routine continue to retry even when the application is already destroyed. Normally, the application wouldn't be destroyed at this stage, but because of the timeout channel, the deploy process moved onto the next stage and deleted the app.
channelIsStopped := make(chan bool, 1)
channelError := make(chan error, 1)
var err error
go func() {
isStopped := false
for !isStopped {
isStopped, err = isAppStopped(s.client, appDeploy.App.GUID)
time.Sleep(delayBetweenRequests * time.Second)
}
channelIsStopped <- isStopped
channelError <- err
}()
select {
case <-channelIsStopped:
case <-time.After(stopAppTimeout * time.Second):
// App is not in expected state (stopped) after waiting for the timeout
log.Print("Timeout reached while waiting for application to stop.")
case <-channelError:
return ctx, err
}
v0.51.3
has this fix