[Bug:] HTTPS error using LiteLLM router with instructor to VLLM server
Opened this issue · 0 comments
Hi,
I'm trying to interface with a VLLM server hosting a Llama-2 model by using structured outputs with the instructor
library. Based on the documentation here I need to use a Router()
to patch my instructor client, which I did with this:
api_base = {
"meta-Llama/LlamaGuard-7b": "http://localhost:8070",
"meta-Llama/Llama-2-7b-chat-hf": "http://localhost:8069",
}
instructor_client = instructor.patch(
Router(
{
"model_name": "meta-Llama/LlamaGuard-7b",
"litellm_params": { # params for litellm completion/embedding call - e.g.: https://github.com/BerriAI/litellm/blob/62a591f90c99120e1a51a8445f5c3752586868ea/litellm/router.py#L111
"model": "hosted_vllm/meta-Llama/LlamaGuard-7b", # response_model=JudgeVerdict, api_base=api_base["meta-Llama/LlamaGuard-7b"], api_key=""),
"api_base": api_base["meta-Llama/LlamaGuard-7b"],
"api_key": ""
},
},
{
"model_name": "meta-Llama/Llama-2-7b-chat-hf",
"litellm_params": { # params for litellm completion/embedding call - e.g.: https://github.com/BerriAI/litellm/blob/62a591f90c99120e1a51a8445f5c3752586868ea/litellm/router.py#L111
"model": "hosted_vllm/meta-Llama/Llama-2-7b-chat-hf", # response_model=JudgeVerdict, api_base=api_base["meta-Llama/Llama-2-7b-chat-hf"], api_key=""),
"api_base": api_base["meta-Llama/Llama-2-7b-chat-hf"],
"api_key": ""
},
}
]
)
)
However, I notice that a request with a bearer token is being created, suggesting that either the Router
, or instructor thinks/assumes that the destination is a https URL:
httpcore.LocalProtocolError: Illegal header value b'Bearer '
I've pasted my whole traceback here for anyone to take a look.
I notice that litellm is invoking an OpenAI client that seems to be the root cause of this issue:
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr1/data/abhinavr/atac/lib/python3.11/site-packages/litellm/llms/OpenAI/openai.py", line 810, in completion
raise e
File "/usr1/data/abhinavr/atac/lib/python3.11/site-packages/litellm/llms/OpenAI/openai.py", line 746, in completion
self.make_sync_openai_chat_completion_request(
File "/usr1/data/abhinavr/atac/lib/python3.11/site-packages/litellm/llms/OpenAI/openai.py", line 605, in make_sync_openai_chat_completion_request
raise e
File "/usr1/data/abhinavr/atac/lib/python3.11/site-packages/litellm/llms/OpenAI/openai.py", line 594, in make_sync_openai_chat_completion_request
raw_response = openai_client.chat.completions.with_raw_response.create(
Is there any way to force router to create a HTTP request? VLLM does not provide a HTTP endpoint and I wish to interact with it using structured outputs.
Thanks!