argoproj/applicationset

Github rate limit hit, even using `scmProvider.filters`

dllegru opened this issue · 1 comments

The setup we want to accomplish using scmProvider for Github, is the following:

  • track a single repository
  • track & generate applications only from certain branches from that repository

Our Github org has about 250 repositories. When using scmProvider.github with certain scmProvider.filters combinations to restrict the tracking to a single repository, argo is constantly sending Github api requests and we hit the 5k rate-limit in ~40minutes. That should not be the behaviour when just tracking a single repository.

I've seen that this was initially reported in the issue #464 and a fix done in PR #472. I think this fix is just partially working and some parameters used in combination are not working well.


We've performed different scmProvider scenarios for our use case and below are the outcomes:

  • repo to track: platform-daniel-tests

    • branch names: [main, dev-test1, dev-test2]
  • only deploy from branches with regex: ^dev

Scenario 1: allBranches to false.

    - scmProvider:
        github:
          organization: my-org
          # If true, scan every branch of every repository. If false, scan only the default branch. Defaults to false.
          allBranches: false
        filters:
          - repositoryMatch: platform-daniel-tests
            branchMatch: ^dev

Outcomes:

  • rate-limit: ❌ (used initial ~500 api calls)
  • deploys: ❌ (no apps generated)

Comments:
When deploying this configuration, seems an initial scan is done as it is using ~500 api calls to Github. After that initial scan, the api calls stop. Not sure why this is done as we're locking-in the repositoryMatch to just a single repository + allBranches is set also to false, imo we shouldn't have to use all those calls.
No apps are generated by applicationSet.


Scenario 2 allBranches to true.

    - scmProvider:
        github:
          organization: my-org
          # If true, scan every branch of every repository. If false, scan only the default branch. Defaults to false.
          allBranches: true
        filters:
          - repositoryMatch: platform-daniel-tests
            branchMatch: ^dev

Outcomes:

  • rate-limit: ❌ (never stops increasing, we hit 5k rate-limit in ~40mins)
  • deploys: ✅ (desired branches apps generated)

Comments:
With this configuration, the api requests to github are done non-stop, we are constantly sending api requests until we hit the 5k limit in about ~40 mins.
Two application resources get created and we have deployments done, from branches dev-test1 & dev-test2.
This is the scenario we want, but we can't use it due hitting the rate-limit.


Scenario 3 allBranches to false, no branchMatch used

    - scmProvider:
        github:
          organization: my-org
          # If true, scan every branch of every repository. If false, scan only the default branch. Defaults to false.
          allBranches: false
        filters:
          - repositoryMatch: platform-daniel-tests

Outcomes:

  • rate-limit: ✅ (only used ~10 api calls)
  • deploys: ✅🟡 (only deploys default branch as intended but not valid for use case)

Comments:
Only 10 api calls are done and stops there, some minor calls from time to time.
Only generates application from main branch which is the default for the repo platform-daniel-tests.
The outcome is good, but not valid for our use case.


Scenario 4 allBranches true, no branchMatch used:

    - scmProvider:
        github:
          organization: my-org
          # If true, scan every branch of every repository. If false, scan only the default branch. Defaults to false.
          allBranches: true
        filters:
          - repositoryMatch: platform-daniel-tests

Outcomes:

  • rate-limit: ✅ (only used ~10 api calls)
  • deploys: ✅🟡 (deploys all branches as intended but not valid for use case)

Comments:
Only 10 api calls are done and stops there, some minor calls from time to time.
Generates applications from all branches [main, dev-test1, dev-test2]
The outcome is good, but not valid for our use case.