[Bug:] HTTPS error using LiteLLM router with instructor to VLLM server

Question

[Bug:] HTTPS error using LiteLLM router with instructor to VLLM server

Opened this issue 2 months ago · 0 comments

Hi,
I'm trying to interface with a VLLM server hosting a Llama-2 model by using structured outputs with the instructor library. Based on the documentation here I need to use a Router() to patch my instructor client, which I did with this:

api_base = {
    "meta-Llama/LlamaGuard-7b": "http://localhost:8070",
    "meta-Llama/Llama-2-7b-chat-hf": "http://localhost:8069",
}

instructor_client = instructor.patch(
    Router(
            {
                "model_name": "meta-Llama/LlamaGuard-7b",
                "litellm_params": {  # params for litellm completion/embedding call - e.g.: https://github.com/BerriAI/litellm/blob/62a591f90c99120e1a51a8445f5c3752586868ea/litellm/router.py#L111
                    "model": "hosted_vllm/meta-Llama/LlamaGuard-7b", # response_model=JudgeVerdict,  api_base=api_base["meta-Llama/LlamaGuard-7b"], api_key=""),
                    "api_base": api_base["meta-Llama/LlamaGuard-7b"],
                    "api_key": ""
                },
            },
            {
                "model_name": "meta-Llama/Llama-2-7b-chat-hf",
                "litellm_params": {  # params for litellm completion/embedding call - e.g.: https://github.com/BerriAI/litellm/blob/62a591f90c99120e1a51a8445f5c3752586868ea/litellm/router.py#L111
                    "model": "hosted_vllm/meta-Llama/Llama-2-7b-chat-hf", # response_model=JudgeVerdict, api_base=api_base["meta-Llama/Llama-2-7b-chat-hf"], api_key=""),
                    "api_base": api_base["meta-Llama/Llama-2-7b-chat-hf"],
                    "api_key": ""
                },
            }
        ]
    )
)

However, I notice that a request with a bearer token is being created, suggesting that either the Router, or instructor thinks/assumes that the destination is a https URL:

httpcore.LocalProtocolError: Illegal header value b'Bearer '

I've pasted my whole traceback here for anyone to take a look.

I notice that litellm is invoking an OpenAI client that seems to be the root cause of this issue:

The above exception was the direct cause of the following exception:
 
Traceback (most recent call last):
  File "/usr1/data/abhinavr/atac/lib/python3.11/site-packages/litellm/llms/OpenAI/openai.py", line 810, in completion
    raise e
  File "/usr1/data/abhinavr/atac/lib/python3.11/site-packages/litellm/llms/OpenAI/openai.py", line 746, in completion
    self.make_sync_openai_chat_completion_request(
  File "/usr1/data/abhinavr/atac/lib/python3.11/site-packages/litellm/llms/OpenAI/openai.py", line 605, in make_sync_openai_chat_completion_request
    raise e
  File "/usr1/data/abhinavr/atac/lib/python3.11/site-packages/litellm/llms/OpenAI/openai.py", line 594, in make_sync_openai_chat_completion_request
    raw_response = openai_client.chat.completions.with_raw_response.create(

Is there any way to force router to create a HTTP request? VLLM does not provide a HTTP endpoint and I wish to interact with it using structured outputs.
Thanks!