kubernetes-sigs/kubectl-validate

GitHub downloader subject to rate limits

alexzielenski opened this issue · 2 comments

For unsupported schemas we have the option of downloading them from GitHub. This requires 1 or 2 api calls to list Kubernetes source directories. GitHub responses support E-Tags in their responses so can be cached without affecting the rate limit. We should add an on-disk cache similar to kubectl that stores the etags.

Another improvement would be to use the GitHub api key if supplied in envar or argument. That way the test suite would not ever be subject to rate limiting.

I am interested in taking this on

We should add an on-disk cache similar to kubectl that stores the etags

To be clear, we want to cache the response data as well, not just the etag? Will take a look at how kubectl manages this as well.

/assign

To be clear, we want to cache the response data as well, not just the etag? Will take a look at how kubectl manages this as well.

That's correct. For caching discovery & openapi kubectl uses a caching http.RoundTripper: https://github.com/kubernetes/client-go/blob/master/discovery/cached/disk/round_tripper.go

It may be possible for us to just use a caching roundtripper directly for our GitHub downloader. kubectl uses the roundtripper wrapped inside a rest client:

Wraps rest client config here:
https://github.com/kubernetes/client-go/blob/cf830e3cb3abbcc32cc1b6bea4feb1a9a1881af3/discovery/cached/disk/cached_discovery.go#L301

Create caching rest client:
https://github.com/kubernetes/client-go/blob/cf830e3cb3abbcc32cc1b6bea4feb1a9a1881af3/discovery/discovery_client.go#L715

Gives to open api here:
https://github.com/kubernetes/client-go/blob/cf830e3cb3abbcc32cc1b6bea4feb1a9a1881af3/discovery/discovery_client.go#L656