REQUEST: Create a Github token for Cluster API jobs
killianmuldoon opened this issue · 18 comments
Organization or Repo
kubernetes-sigs/cluster-api
User affected
No response
Describe the issue
We're having an issue with rate-limits in some of the Cluster API end-to-end tests running on Prow. Adding a github token would make the tests much less flaky.
The token only requires basic read permissions in order to prevent rate-limiting.
@ameukam I see you worked on something similar here - kubernetes/k8s.io#4259. Any insight into how this should work?
Do you have an example of job failing because of rate-limit ?
we have a proxy to cache Github tokens (ghproxy.default.cluster.local
). Maybe we can use that ?
Here's a link to a failing job - https://storage.googleapis.com/k8s-triage/index.html?job=.*-cluster-api-.*&xjob=.*-provider-.*#248b18d3604f56466fe3
I think the reason we were thinking about using a secret is that there's already an existing cluster-lifecycle, but we don't know what permissions it has or where it's configured. More info on that in this slack thread: https://kubernetes.slack.com/archives/C09QZ4DQB/p1681481907191729
How would we use the ghproxy?
I was wondering if using ghproxy is an option. To me it felt like depending on an implementation detail of Prow.
Basically clusterctl - our Cluster API cli tool - is pulling files from GitHub releases. But there are more places where we call the GitHub API.
Ideally we would prefer testing clusterctl like our users are using it, by interacting with the normal GitHub API.
I think we would probably also have to make a bunch of adjustments to clusterctl to be able to talk to ghproxy instead of the normal GitHub endpoints.
We just saw that a bunch of jobs across Prow are using their own GitHub tokens, so that felt easier.
I'm not sure about the usage of a GitHub token in order to solve the rate limit issue. The prowjob will pull github repos using anonymous auth through HTTPS to run tests. A GitHub token is only used when there is an authentication step needed to the Github API.
In CAPI's case it is not the prow part, pulling the github repo which fails.
The test itself (after Prow pulled the repo and started the test) has code (in cluster API's CLI) which fetches artifacts from Github (e.g. yaml's from cert-manager releases). And there is the known issue that this may hit the rate-limit of Github.
The code in CAPI is able to leverage a github token (via the GITHUB_TOKEN
environment variable) and by that bypass the anonymous rate-limit.