must handle rate limits from open ai

Question

must handle rate limits from open ai

Opened this issue a year ago · 2 comments

openai.error.RateLimitError: Rate limit reached for gpt-4 in organization org-lDdTak03uNZ02kmY5m6ginja on tokens per min. Limit: 10000 / min. Please try again in 6ms. Contact us through our help center at help.openai.com if you continue to have issues.

so wait and retry when we see this...

Answer 1 · 2023-11-07T19:04:43.000Z

Added retry logic with decorator in gpt_client
Although, need to decrease delay to 0.1 second - should be enough for 10k/min rate.

Alternatively, I can add 0.1-second delay before every request - but that would be less elegant
Or set up a timer and counter for requests, reset the counter every second - that feels like an overshoot

Answer 2 · 2023-11-23T01:11:10.000Z

@lynxrv21 @hookla I'm the maintainer of LiteLLM - I believe we can help with this problem - I'd love your feedback if LiteLLM is missing something

Here's the quick start:
docs: https://docs.litellm.ai/docs/routing

from litellm import Router

model_list = [{ # list of model deployments 
    "model_name": "gpt-3.5-turbo", # model alias 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-v-2", # actual model name
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-functioncalling", 
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "gpt-3.5-turbo", 
        "api_key": os.getenv("OPENAI_API_KEY"),
    }
}]

router = Router(model_list=model_list)

# openai.ChatCompletion.create replacement
response = await router.acompletion(model="gpt-3.5-turbo", 
                messages=[{"role": "user", "content": "Hey, how's it going?"}])

print(response)