Rate limiting options

Question

Rate limiting options

asg017 opened this issue a year ago · 2 comments

Different providers have different limits, so might be hard to coordinate. Will also need settings to disable/alter limits for self-hosted options.

Answer 1 · 2024-07-25T20:50:04.000Z

Some providers even return HTTP headers that inform the client of the current rate limit remaining and when it would be reset, so it would be possible to automatically throttle requests to those providers (sleep automatically until the next "reset" point).

A call to the OpenAI embeddings API for example returns this:

x-ratelimit-limit-requests: 5000
x-ratelimit-limit-tokens: 5000000
x-ratelimit-remaining-requests: 4999
x-ratelimit-remaining-tokens: 4999990
x-ratelimit-reset-requests: 12ms
x-ratelimit-reset-tokens: 0s

Answer 2 · 2024-07-25T20:53:45.000Z

Oooh interesting, that would be way better than trying to wrap some rust rate limiter around every client. Will score through all these clients and see which ones support that besides openai