societe-generale/github-crawler

need throttling options

vincent-fuchs opened this issue · 2 comments

Summary

in some cases, rate limitation may be in place and pretty restrictive, even for authenticated API requests.

We would need a throttling config param, under which we can specify :

  • the throttling period in seconds
  • the max number of repositories we can process during the throttling period.

if we cross the limit, we need to wait for the throttling period before we continue.

While we're at it, we can move the existing crawlInParallel property under the throttling property, as it's related.

Type of Issue

It is a :

  • request

Your Environment

  • Version used: 1.0.11
  • OS and version:
  • Version of libs used:

Moreover, the standard rate-limit headers of GitHub & Gitlab could be supported: X-RateLimit-Limit / X-RateLimit-Remaining / X-RateLimit-Reset

Moreover, the standard rate-limit headers of GitHub & Gitlab could be supported: X-RateLimit-Limit / X-RateLimit-Remaining / X-RateLimit-Reset

yes, that's the idea - interested in contributing ? ;-)