BerriAI/litellm

NO FALLBACK when streaming and [Bug]: litellm.InternalServerError: AnthropicException - Overloaded. Handle with `litellm.InternalServerError`.

Closed this issue · 6 comments

What happened?

Using LiteLLM proxy (LiteLLM: Current Version = 1.50.2)

With this config.yaml:

model_list:
  #llm_routes
  - model_name: basic 
    litellm_params: 
      model:  "anthropic/claude-3-haiku-20240307"
      api_key:  XXX

litellm_settings:
  modify_params: True
  drop_params: True
  safe_mode: False
  num_retries: 3 # retry call 3 times on each model_name
  request_timeout: 10 # raise Timeout error if call takes longer than 10s. Sets litellm.request_timeout 
  fallbacks: [
    {"basic": ["openai/gpt-4o-mini"]}
  ]
  allowed_fails: 3 # cooldown model if it fails > 1 call in a minute. 
  cooldown_time: 30 # how long to cooldown model in seconds if fails/min > allowed_fails
  cache: true 
  cache_params:         # cache_params are optional
    type: "redis"  # The type of cache to initialize. Can be "local" or "redis". Defaults to "local".
    host: "localhost"  # The host address for the Redis cache. Required if type is "redis".
    port: 6379  # The port number for the Redis cache. Required if type is "redis".
    ttl: 600 
router_settings:
  routing_strategy: latency-based-routing 
  num_retries: 2
  timeout: 30                                  # 30 seconds
general_settings: 
  master_key: XXXX

As you can see I have set up a model_name named "basic" that proxies to "anthropic/claude-3-haiku-20240307" with a fallback to "openai/gpt-4o-mini".

I would expect that if anything happens to my first option (claude-3-haiku) the request would be retried with gpt-40-mini, but when Anthropics server overloads, instead of trying opanai an exception occurs:

[92m18:08:34 - LiteLLM Proxy:ERROR^[[0m: proxy_server.py:2669 - litellm.proxy.proxy_server.async_data_generator(): Exception occured - litellm.InternalServerError: AnthropicException - Overloaded. Handle with `litellm.InternalServerError`.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/litellm/utils.py", line 8117, in __anext__
    async for chunk in self.completion_stream:
  File "/usr/local/lib/python3.10/dist-packages/litellm/llms/anthropic/chat/handler.py", line 833, in __anext__
    return self.chunk_parser(chunk=data_json)
  File "/usr/local/lib/python3.10/dist-packages/litellm/llms/anthropic/chat/handler.py", line 755, in chunk_parser
    raise AnthropicError(
litellm.llms.anthropic.common_utils.AnthropicError: Overloaded

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/litellm/proxy/proxy_server.py", line 2648, in async_data_generator
    async for chunk in response:
  File "/usr/local/lib/python3.10/dist-packages/litellm/utils.py", line 8345, in __anext__
    raise exception_type(
  File "/usr/local/lib/python3.10/dist-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2116, in exception_type
    raise e
  File "/usr/local/lib/python3.10/dist-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 500, in exception_type
    raise litellm.InternalServerError(
litellm.exceptions.InternalServerError: litellm.InternalServerError: AnthropicException - Overloaded. Handle with `litellm.InternalServerError`.
```

Note that I am doing the request with streaming ON (in case this is relevant), so I sometimes can see the request completing and then failing without kicking in the fallback.

Any idea what might be going on? is this a misconfiguration or a bug?

Thanks very much in advanced

Alex




### Relevant log output

_No response_

### Twitter / LinkedIn details

_No response_

Unable to repro. this works as expected -
Screenshot 2024-11-01 at 6 47 05 PM

@kodemonk i don't see openai/gpt-4o defined in your model list. Can you please add it and try again? if it doesn't work, share your debug logs

model_list:
  - model_name: claude-3-5-sonnet-20240620
    litellm_params:
      model: claude-3-5-sonnet-20240620
      api_key: os.environ/ANTHROPIC_API_KEY
      api_base: "http://0.0.0.0:8000"
  - model_name: my-fallback-openai-model
    litellm_params:
      model: openai/gpt-3.5-turbo
      api_key: os.environ/OPENAI_API_KEY

litellm_settings:
  fallbacks: [{ "claude-3-5-sonnet-20240620": ["my-fallback-openai-model"] }]

Same issue.
Fallbacks work on all other failure conditions, but not when Overloaded.

@spammenotinoz can you please share a script / way to repro this, as this is not something we've been able to reproduce

Thanks for the reply, also frustrating that this is not something I have been able to trigger.
Fallbacks then the model fails, token size ect.. I can trigger and fallback works, but when "overloaded", doesn't seem to failback.
Yesterday Sonnet was overloaded, but Haiku was fine.

I could be doing something really stupid with the fallback settings, can you please review this config;

litellm_settings:
drop_params: true
anthropic_api_key: ${ANTHROPIC_API_KEY}
openai_api_key: ${OPENAI_API_KEY}
openrouter_api_key: ${OPENROUTER_API_KEY}
set_verbose: false
request_timeout: 60
num_retries: 1
retry_delay: 1
retry_on:
- 429
- 500
- 502
- 503
- 504
- 529
- "overloaded"
default_fallbacks: ["GPT-4o mini"]
fallbacks: [{"Claude 3.5 Sonnet": ["Claude 3.5 Haiku"]}, {"Claude 3.5 Haiku": ["GPT-4o mini"]}, {"o1-preview": ["o1-mini"]}]
content_policy_fallbacks: [{"Claude 3.5 Sonnet": ["Claude 3.5 Haiku"]}, {"Claude 3.5 Haiku": ["GPT-4o mini"]}, {"o1-preview": ["o1-mini"]}]
context_policy_fallbacks: [{"Claude 3.5 Sonnet": ["Claude 3.5 Haiku"]}, {"Claude 3.5 Haiku": ["GPT-4o mini"]}, {"o1-preview": ["o1-mini"]}]

While not in my current config, I also tried adding the following;
exception_strategy:
"*": fallback
"overloaded": fallback
"internal_server_error": fallback

Below is the error in the Litellm log
15:12:20 - LiteLLM Proxy:ERROR: proxy_server.py:2702 - litellm.proxy.proxy_server.async_data_generator(): Exception occured - litellm.InternalServerError: AnthropicException - Overloaded. Handle with litellm.InternalServerError.
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/litellm/litellm_core_utils/streaming_handler.py", line 1799, in anext
async for chunk in self.completion_stream:
File "/usr/local/lib/python3.11/site-packages/litellm/llms/anthropic/chat/handler.py", line 768, in anext
return self.chunk_parser(chunk=data_json)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/litellm/llms/anthropic/chat/handler.py", line 659, in chunk_parser
raise AnthropicError(
litellm.llms.anthropic.common_utils.AnthropicError: Overloaded

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/litellm/proxy/proxy_server.py", line 2681, in async_data_generator
async for chunk in response:
File "/usr/local/lib/python3.11/site-packages/litellm/litellm_core_utils/streaming_handler.py", line 1981, in anext
raise exception_type(
^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2136, in exception_type
raise e
File "/usr/local/lib/python3.11/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 500, in exception_type
raise litellm.InternalServerError(
litellm.exceptions.InternalServerError: litellm.InternalServerError: AnthropicException - Overloaded. Handle with litellm.InternalServerError.

same problem

i think we'll need to add some retry/fallback logic inside the customstreamwrapper

as a v0 i'll start with retries