Rate limiting options
asg017 opened this issue · 2 comments
asg017 commented
Different providers have different limits, so might be hard to coordinate. Will also need settings to disable/alter limits for self-hosted options.
simonw commented
Some providers even return HTTP headers that inform the client of the current rate limit remaining and when it would be reset, so it would be possible to automatically throttle requests to those providers (sleep automatically until the next "reset" point).
A call to the OpenAI embeddings API for example returns this:
x-ratelimit-limit-requests: 5000
x-ratelimit-limit-tokens: 5000000
x-ratelimit-remaining-requests: 4999
x-ratelimit-remaining-tokens: 4999990
x-ratelimit-reset-requests: 12ms
x-ratelimit-reset-tokens: 0s
asg017 commented
Oooh interesting, that would be way better than trying to wrap some rust rate limiter around every client. Will score through all these clients and see which ones support that besides openai