[Bug]: Cannot get past 50 RPS
Opened this issue · 5 comments
What happened?
I have OpenAI tier 5 usage, which should give me 30,000 RPM = 500 RPS with "gpt-4o-mini". However I struggle get past 50 RPS.
The minimal replication:
from litellm import acompletion
tasks = [acompletion(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You're an agent who answers yes or no"},
{"role": "user", "content": "Is the sky blue?"},
],
) for i in range(2000)]
I only get 50 items/second as opposed to ~500 items/second when sending raw HTTP requests.
Relevant log output
16%|█████████████████████▌ | 320/2000 [00:09<00:40, 41.49it/s]
Twitter / LinkedIn details
No response
hi @vutrung96 looking into this, how do you get the % complete log output ?
Hi @ishaan-jaff I was just using tqdm
Hi @ishaan-jaff , any updates on this, also facing this issue!
hi @vutrung96 @CharlieJCJ do you see the issue on litellm.router too ? https://docs.litellm.ai/docs/routing
It would help me if you could test with litellm router too
Hi @ishaan-jaff
We tracked down the root cause of the issue.
Litellm uses the official OpenAI python client
client: Optional[Union[OpenAI, AsyncOpenAI]] = None,
The official OpenAI client has performance issues with high numbers of concurrent requests due to issues in httpx
The issues in httpx are due to a number of factors related to anyio vs asyncio
Which are addressed in the open PRs below
We saw this when implementing litellm as the backend for our synthetic data engine
When using our own openai client (with aiohttp instead of httpx) we saturate the highest rate limits (30,000 requests per minute on gpt-4o-mini tier 5). When using litellm, the performance issues cap us well under the highest rate limit (200 queries per second - 12,000 requests per minute).