v0.28.5: Atlantis fails to finish plan when hitting the Github secondary API rate limit since upgrading

Question

v0.28.5: Atlantis fails to finish plan when hitting the Github secondary API rate limit since upgrading

Opened this issue 3 months ago · 7 comments

snorlaX-sleeps commented 3 months ago

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Overview of the Issue

(edited)

Recently we have upgraded from v0.23.x to v0.28.5
We have been using Atlantis for a few years.

One of our Atlantis installations will regularly hit the Github Secondary API rate-limit when posting comments after running an Atlantis plan for ~500 individual projects.

When we were using v0.23.x the comments will be limited to ~ 100 per plan (due to the rate-limit), but Atlantis would mark the plan as complete / successful (if it was) and update the PR statuses as per usual. This means the successful plans can be applied and the PR merged.

When we upgraded to v0.28.x the comments will be limited to ~ 100 per plan (due to the rate-limit), but Atlantis appears to stop functioning. None of the PR statuses still "pending" will be updated, the plans are not available for applying and since an apply is "required" to merge a PR, the PR cannot be merged.

So it would appear something has changed around handling errors when posting to a Github PR.
The rest of the configuration (number of plans, number of Atlantis) has not changed.
Please let me know if this has been resolved elsewhere.

Create too much content on GitHub in a short amount of time. In general, no more than 80 content-generating requests per minute and no more than 500 content-generating requests per hour are allowed. Some endpoints have lower content creation limits. Content creation limits include actions taken on the GitHub web interface as well as via the REST API and GraphQL API.

Github Docs (docs are quite circular)

Reproduction Steps

Create a configuration to run plans for as many projects as will generate over 100 comments.
(In our case we have one state folder running ~500 plans)

Logs

Logs from v0.28.5, please see comment below for v0.23.5

{"level":"error","ts":"2024-09-18T02:24:55.583Z","caller":"vcs/instrumented_client.go:116","msg":"Unable to create comment for command plan, error: POST https://api.github.com/repos/<org>/<repo>/issues/<pr_id>/comments: 403 You have exceeded a secondary rate limit and have been temporarily blocked from content creation. Please retry your request again later.","json":{"repo":"<org>/<repo>","pull":"<pr_id>"},"stacktrace":"github.com/runatlantis/atlantis/server/events/vcs.(*InstrumentedClient).CreateComment\tgithub.com/runatlantis/atlantis/server/events/vcs/instrumented_client.go:116github.com/runatlantis/atlantis/server/events/vcs.(*ClientProxy).CreateComment\tgithub.com/runatlantis/atlantis/server/events/vcs/proxy.go:65github.com/runatlantis/atlantis/server/events.(*PullUpdater).updatePull\tgithub.com/runatlantis/atlantis/server/events/pull_updater.go:51github.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).runAutoplan\tgithub.com/runatlantis/atlantis/server/events/plan_command_runner.go:145github.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).Run\tgithub.com/runatlantis/atlantis/server/events/plan_command_runner.go:305github.com/runatlantis/atlantis/server/events.(*DefaultCommandRunner).RunAutoplanCommand\tgithub.com/runatlantis/atlantis/server/events/command_runner.go:223"}

{ "level": "error", "ts": "2024-09-18T02:24:55.583Z", "caller": "events/pull_updater.go:52", "msg": "unable to comment: POST https://api.github.com/repos/<org>/<repo>/issues/<pr_id>/comments: 403 You have exceeded a secondary rate limit and have been temporarily blocked from content creation. Please retry your request again later.", "json": { "repo": "<org>/<repo>", "pull": "<pr_id>" }, "stacktrace": "github.com/runatlantis/atlantis/server/events.(*PullUpdater).updatePull\tgithub.com/runatlantis/atlantis/server/events/pull_updater.go:52github.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).runAutoplan\tgithub.com/runatlantis/atlantis/server/events/plan_command_runner.go:145github.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).Run\tgithub.com/runatlantis/atlantis/server/events/plan_command_runner.go:305github.com/runatlantis/atlantis/server/events.(*DefaultCommandRunner).RunAutoplanCommand\tgithub.com/runatlantis/atlantis/server/events/command_runner.go:223" }

Environment details

Atlantis version: v0.28.5
Deployment method: eks / own tf module

Additional Context

We have added the flag ATLANTIS_MAX_COMMENTS_PER_COMMAND = 80, however this limits the number of comments per project not the number of comments per execution.
The Atlantis installation has it's own Github App, not shared.

Related PRs, potentially around the same subject but different context:

#3322
#3756

Answer 1 · 2024-09-18T19:35:07.000Z

Hey there, @snorlaX-sleeps! I'm here to help you with any bugs, questions, or contributions you have. Let's work together to solve this issue!

I found a similar open issue that might be relevant to your problem: Optionally Limit GitHub Comment Creation, which was last updated on August 05, 2024. This issue discusses the need to limit the number of comments generated to avoid hitting GitHub's secondary API rate limits. It was referenced by your issue v0.28.5: Changes to Github PR comments results in rate-limiting from the secondary API, which was last updated on September 18, 2024.

To address your issue, you can use the --max-comments-per-command parameter to globally limit the number of comments per execution. This parameter helps control the number of comments Atlantis produces, thereby helping to avoid hitting GitHub's secondary API rate limits.

Here's how you can set it up:

atlantis server --max-comments-per-command=100
# or
ATLANTIS_MAX_COMMENTS_PER_COMMAND=100

This option limits the number of comments published after a command is executed, to prevent spamming your VCS and Atlantis from getting throttled as a result. The default value is 100 ^[1].

Additionally, you can adjust the parallel pool size to control the number of concurrent operations:

# Adjusting the parallel pool size
atlantis server --parallel-pool-size=<desired_value>
# or
ATLANTIS_PARALLEL_POOL_SIZE=<desired_value>

The default value for --parallel-pool-size is 15 ^[1].

By configuring these parameters, you can better manage the rate at which Atlantis generates comments and avoid hitting GitHub's rate limits.

To continue talking to Dosu, mention @dosu.

Answer 2 · 2024-09-19T10:11:15.000Z

Hi @dosu
As mentioned on this issue, we've tried to set the --max-comments-per-command to a value of 80.
Unfortunately this seems to limit the number of comments per command e.g per terraform plan rather than the expected per Atlantis command.

We also already set the --parallel-pool-size - but since all the comments are posted at once, at the end of an Atlantis execution, this does not resolve the issue and only makes the atlantis plan take longer

Answer 3 · 2024-09-19T10:12:14.000Z

As a note, the current workaround is to disable comments on the PR in question so the change can be applied and merged

Answer 4 · 2024-09-19T10:13:15.000Z

To address the issue of rate-limiting due to excessive comments, you can consider the following options:

Batching or Aggregating Comments:
Atlantis has a mechanism to batch or aggregate comments to reduce the number of API requests made to GitHub. The HidePrevCommandComments function in the GithubClient class is designed to minimize previous comments by batching them into fewer API requests. This can help in reducing the number of comments and thus the API requests.

Example usage:

err = client.HidePrevCommandComments(
    logger,
    models.Repo{
        FullName:          "owner/repo",
        Owner:             "owner",
        Name:              "repo",
        CloneURL:          "",
        SanitizedCloneURL: "",
        VCSHost: models.VCSHost{
            Hostname: "github.com",
            Type:     models.Github,
        },
    },
    123, // pull request number
    command.Plan.TitleString(),
    "",
)

This function processes multiple comments in a single request, minimizing the number of API calls ^[1].

Additional Configuration Options:
There are several flags that can help manage or mitigate GitHub API rate limits:
- --quiet-policy-checks: Excludes policy check comments from pull requests unless there's an actual error from conftest.
- --silence-vcs-status-no-plans: Silences VCS commit status when autoplan finds no projects to plan.
- --silence-vcs-status-no-projects: Silences VCS commit status for all commands when a project is not defined.
Example usage:
```
atlantis server --quiet-policy-checks
# or
ATLANTIS_QUIET_POLICY_CHECKS=true

atlantis server --silence-vcs-status-no-plans
# or
ATLANTIS_SILENCE_VCS_STATUS_NO_PLANS=true

atlantis server --silence-vcs-status-no-projects
# or
ATLANTIS_SILENCE_VCS_STATUS_NO_PROJECTS=true
```
These options can help reduce the number of comments and status updates, thereby managing the rate limits more effectively ^[2]^[3].

By implementing these strategies, you can better manage the number of comments generated and avoid hitting GitHub's rate limits.

To continue talking to Dosu, mention @dosu.

Answer 5 · 2024-09-19T10:13:28.000Z

To address the issue of rate-limiting due to excessive comments, you can consider the following options:

Batching or Aggregating Comments:
Atlantis has a mechanism to batch or aggregate comments to reduce the number of API requests made to GitHub. The HidePrevCommandComments function in the GithubClient class is designed to minimize previous comments by batching them into fewer API requests. This can help in reducing the number of comments and thus the API requests.

Example usage:

err = client.HidePrevCommandComments(
    logger,
    models.Repo{
        FullName:          "owner/repo",
        Owner:             "owner",
        Name:              "repo",
        CloneURL:          "",
        SanitizedCloneURL: "",
        VCSHost: models.VCSHost{
            Hostname: "github.com",
            Type:     models.Github,
        },
    },
    123, // pull request number
    command.Plan.TitleString(),
    "",
)

This function processes multiple comments in a single request, minimizing the number of API calls ^[1].

Additional Configuration Options:
There are several flags that can help manage or mitigate GitHub API rate limits:
- --quiet-policy-checks: Excludes policy check comments from pull requests unless there's an actual error from conftest.
- --silence-vcs-status-no-plans: Silences VCS commit status when autoplan finds no projects to plan.
- --silence-vcs-status-no-projects: Silences VCS commit status for all commands when a project is not defined.
Example usage:
```
atlantis server --quiet-policy-checks
# or
ATLANTIS_QUIET_POLICY_CHECKS=true

atlantis server --silence-vcs-status-no-plans
# or
ATLANTIS_SILENCE_VCS_STATUS_NO_PLANS=true

atlantis server --silence-vcs-status-no-projects
# or
ATLANTIS_SILENCE_VCS_STATUS_NO_PROJECTS=true
```
These options can help reduce the number of comments and status updates, thereby managing the rate limits more effectively ^[2]^[3].

By implementing these strategies, you can better manage the number of comments generated and avoid hitting GitHub's rate limits.

To continue talking to Dosu, mention @dosu.

Answer 6 · 2024-09-19T14:44:52.000Z

Reverting the image back to v0.23.5 still hits the rate-limiting issue however the status on the PR is still updated, allowing the apply to proceed and the PR to be merged.

Logs from v0.23.5:

{"level":"error","ts":"2024-09-19T14:23:16.170Z","caller":"vcs/instrumented_client.go:123","msg":"Unable to create comment for command plan, error: POST https://api.github.com/repos/<gh_org>/<repo>/issues/<pd_id>/comments: 403 You have exceeded a secondary rate limit and have been temporarily blocked from content creation. Please retry your request again later. []","json":{"repository":"<gh_org>/<repo>","pull-num":"<pd_id>"},"stacktrace":"github.com/runatlantis/atlantis/server/events/vcs.(*InstrumentedClient).CreateComment\n\tgithub.com/runatlantis/atlantis/server/events/vcs/instrumented_client.go:123\ngithub.com/runatlantis/atlantis/server/events/vcs.(*ClientProxy).CreateComment\n\tgithub.com/runatlantis/atlantis/server/events/vcs/proxy.go:60\ngithub.com/runatlantis/atlantis/server/events.(*PullUpdater).updatePull\n\tgithub.com/runatlantis/atlantis/server/events/pull_updater.go:32\ngithub.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).runAutoplan\n\tgithub.com/runatlantis/atlantis/server/events/plan_command_runner.go:144\ngithub.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).Run\n\tgithub.com/runatlantis/atlantis/server/events/plan_command_runner.go:288\ngithub.com/runatlantis/atlantis/server/events.(*DefaultCommandRunner).RunAutoplanCommand\n\tgithub.com/runatlantis/atlantis/server/events/command_runner.go:174"}

{"level":"error","ts":"2024-09-19T14:23:16.170Z","caller":"events/pull_updater.go:33","msg":"unable to comment: POST https://api.github.com/repos/<gh_org>/<repo>/issues/<pd_id>/comments: 403 You have exceeded a secondary rate limit and have been temporarily blocked from content creation. Please retry your request again later. []","json":{"repo":"<gh_org>/<repo>","pull":"<pd_id>"},"stacktrace":"github.com/runatlantis/atlantis/server/events.(*PullUpdater).updatePull\n\tgithub.com/runatlantis/atlantis/server/events/pull_updater.go:33\ngithub.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).runAutoplan\n\tgithub.com/runatlantis/atlantis/server/events/plan_command_runner.go:144\ngithub.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).Run\n\tgithub.com/runatlantis/atlantis/server/events/plan_command_runner.go:288\ngithub.com/runatlantis/atlantis/server/events.(*DefaultCommandRunner).RunAutoplanCommand\n\tgithub.com/runatlantis/atlantis/server/events/command_runner.go:174"}

This differs from v0.28.5 where the PR status is never updated and therefore the apply cannot proceed (Atlantis presumably crashes)

Answer 7 · 2024-09-19T14:52:37.000Z

Note: I am editing the issue text after testing v0.23.5 to indicate that we were hitting the rate-limiting issue before but that Atlantis still completed the plan process