NO FALLBACK when streaming and [Bug]: litellm.InternalServerError: AnthropicException - Overloaded. Handle with `litellm.InternalServerError`.
Closed this issue · 6 comments
What happened?
Using LiteLLM proxy (LiteLLM: Current Version = 1.50.2)
With this config.yaml:
model_list:
#llm_routes
- model_name: basic
litellm_params:
model: "anthropic/claude-3-haiku-20240307"
api_key: XXX
litellm_settings:
modify_params: True
drop_params: True
safe_mode: False
num_retries: 3 # retry call 3 times on each model_name
request_timeout: 10 # raise Timeout error if call takes longer than 10s. Sets litellm.request_timeout
fallbacks: [
{"basic": ["openai/gpt-4o-mini"]}
]
allowed_fails: 3 # cooldown model if it fails > 1 call in a minute.
cooldown_time: 30 # how long to cooldown model in seconds if fails/min > allowed_fails
cache: true
cache_params: # cache_params are optional
type: "redis" # The type of cache to initialize. Can be "local" or "redis". Defaults to "local".
host: "localhost" # The host address for the Redis cache. Required if type is "redis".
port: 6379 # The port number for the Redis cache. Required if type is "redis".
ttl: 600
router_settings:
routing_strategy: latency-based-routing
num_retries: 2
timeout: 30 # 30 seconds
general_settings:
master_key: XXXX
As you can see I have set up a model_name named "basic" that proxies to "anthropic/claude-3-haiku-20240307" with a fallback to "openai/gpt-4o-mini".
I would expect that if anything happens to my first option (claude-3-haiku) the request would be retried with gpt-40-mini, but when Anthropics server overloads, instead of trying opanai an exception occurs:
[92m18:08:34 - LiteLLM Proxy:ERROR^[[0m: proxy_server.py:2669 - litellm.proxy.proxy_server.async_data_generator(): Exception occured - litellm.InternalServerError: AnthropicException - Overloaded. Handle with `litellm.InternalServerError`.
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/litellm/utils.py", line 8117, in __anext__
async for chunk in self.completion_stream:
File "/usr/local/lib/python3.10/dist-packages/litellm/llms/anthropic/chat/handler.py", line 833, in __anext__
return self.chunk_parser(chunk=data_json)
File "/usr/local/lib/python3.10/dist-packages/litellm/llms/anthropic/chat/handler.py", line 755, in chunk_parser
raise AnthropicError(
litellm.llms.anthropic.common_utils.AnthropicError: Overloaded
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/litellm/proxy/proxy_server.py", line 2648, in async_data_generator
async for chunk in response:
File "/usr/local/lib/python3.10/dist-packages/litellm/utils.py", line 8345, in __anext__
raise exception_type(
File "/usr/local/lib/python3.10/dist-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2116, in exception_type
raise e
File "/usr/local/lib/python3.10/dist-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 500, in exception_type
raise litellm.InternalServerError(
litellm.exceptions.InternalServerError: litellm.InternalServerError: AnthropicException - Overloaded. Handle with `litellm.InternalServerError`.
```
Note that I am doing the request with streaming ON (in case this is relevant), so I sometimes can see the request completing and then failing without kicking in the fallback.
Any idea what might be going on? is this a misconfiguration or a bug?
Thanks very much in advanced
Alex
### Relevant log output
_No response_
### Twitter / LinkedIn details
_No response_
Unable to repro. this works as expected -
@kodemonk i don't see openai/gpt-4o
defined in your model list. Can you please add it and try again? if it doesn't work, share your debug logs
model_list:
- model_name: claude-3-5-sonnet-20240620
litellm_params:
model: claude-3-5-sonnet-20240620
api_key: os.environ/ANTHROPIC_API_KEY
api_base: "http://0.0.0.0:8000"
- model_name: my-fallback-openai-model
litellm_params:
model: openai/gpt-3.5-turbo
api_key: os.environ/OPENAI_API_KEY
litellm_settings:
fallbacks: [{ "claude-3-5-sonnet-20240620": ["my-fallback-openai-model"] }]
Same issue.
Fallbacks work on all other failure conditions, but not when Overloaded.
@spammenotinoz can you please share a script / way to repro this, as this is not something we've been able to reproduce
Thanks for the reply, also frustrating that this is not something I have been able to trigger.
Fallbacks then the model fails, token size ect.. I can trigger and fallback works, but when "overloaded", doesn't seem to failback.
Yesterday Sonnet was overloaded, but Haiku was fine.
I could be doing something really stupid with the fallback settings, can you please review this config;
litellm_settings:
drop_params: true
anthropic_api_key: ${ANTHROPIC_API_KEY}
openai_api_key: ${OPENAI_API_KEY}
openrouter_api_key: ${OPENROUTER_API_KEY}
set_verbose: false
request_timeout: 60
num_retries: 1
retry_delay: 1
retry_on:
- 429
- 500
- 502
- 503
- 504
- 529
- "overloaded"
default_fallbacks: ["GPT-4o mini"]
fallbacks: [{"Claude 3.5 Sonnet": ["Claude 3.5 Haiku"]}, {"Claude 3.5 Haiku": ["GPT-4o mini"]}, {"o1-preview": ["o1-mini"]}]
content_policy_fallbacks: [{"Claude 3.5 Sonnet": ["Claude 3.5 Haiku"]}, {"Claude 3.5 Haiku": ["GPT-4o mini"]}, {"o1-preview": ["o1-mini"]}]
context_policy_fallbacks: [{"Claude 3.5 Sonnet": ["Claude 3.5 Haiku"]}, {"Claude 3.5 Haiku": ["GPT-4o mini"]}, {"o1-preview": ["o1-mini"]}]
While not in my current config, I also tried adding the following;
exception_strategy:
"*": fallback
"overloaded": fallback
"internal_server_error": fallback
Below is the error in the Litellm log
15:12:20 - LiteLLM Proxy:ERROR: proxy_server.py:2702 - litellm.proxy.proxy_server.async_data_generator(): Exception occured - litellm.InternalServerError: AnthropicException - Overloaded. Handle with litellm.InternalServerError
.
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/litellm/litellm_core_utils/streaming_handler.py", line 1799, in anext
async for chunk in self.completion_stream:
File "/usr/local/lib/python3.11/site-packages/litellm/llms/anthropic/chat/handler.py", line 768, in anext
return self.chunk_parser(chunk=data_json)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/litellm/llms/anthropic/chat/handler.py", line 659, in chunk_parser
raise AnthropicError(
litellm.llms.anthropic.common_utils.AnthropicError: Overloaded
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/litellm/proxy/proxy_server.py", line 2681, in async_data_generator
async for chunk in response:
File "/usr/local/lib/python3.11/site-packages/litellm/litellm_core_utils/streaming_handler.py", line 1981, in anext
raise exception_type(
^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2136, in exception_type
raise e
File "/usr/local/lib/python3.11/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 500, in exception_type
raise litellm.InternalServerError(
litellm.exceptions.InternalServerError: litellm.InternalServerError: AnthropicException - Overloaded. Handle with litellm.InternalServerError
.
same problem
i think we'll need to add some retry/fallback logic inside the customstreamwrapper
as a v0 i'll start with retries