How do I load huggingface models?
Opened this issue ยท 16 comments
Hi,
There seem to be some big changes and I cannot find a single example that tells me how to load huggingface models that I was using with HF.model
before. Also the dspy AI tool is broken and no longer able to help.
Best,
Zoher
Hey @Zoher15 , you should install SGLang (if you have a GPU) or Ollama (if you don't have a GPU).
Follow the instructions here: https://dspy-docs.vercel.app/building-blocks/1-language_models/?h=using+locally+hosted+lms#using-locally-hosted-lms
Hi @okhat,
I did go through that example. I was not aware of SGLang, so it seems like to use hugging face models on my GPU, I would need to figure out SGLang first? Is there some advantage to SGLang over HF that I'm missing?
Best,
Zoher
Yes, you need a server-client architecture with good batching to get acceptable speed with local models. Otherwise evaluation and optimization will have to be single threaded and hence extremely slow.
You donโt need to figure out anything per se. Just follow the 3-4 instructions there and let me know if you face any issues.
@okhat so are local non server-client HF models no longer going to be supported at all going forward?
@dzimmerman-nci We will experiment with things like SGLang's Engine
which is non-server client. But standard HF Transformers without additional batching or serving infrastructure are not appropriate for DSPy, or really for any library targeted at using LMs at inference time.
Hey @Zoher15 , you should install SGLang (if you have a GPU) or Ollama (if you don't have a GPU).
Follow the instructions here: https://dspy-docs.vercel.app/building-blocks/1-language_models/?h=using+locally+hosted+lms#using-locally-hosted-lms
So I followed the steps and for:
sglang_port = 7501
sglang_url = f"http://localhost:{sglang_port}/v1"
model = dspy.LM("openai/meta-llama/Meta-Llama-3-8B-Instruct", api_base=sglang_url, model_type='text')
dspy.configure(lm=model)
I receive an error for the OpenAI API key, is this supposed to happen? The model is up and running.....
LiteLLM.Info: If you need to debug this error, use `litellm.set_verbose=True'.
Traceback (most recent call last):
File "/data/zkachwal/miniconda3/envs/moderation/lib/python3.10/site-packages/litellm/llms/OpenAI/openai.py", line 1625, in completion
openai_client = OpenAI(
File "/data/zkachwal/miniconda3/envs/moderation/lib/python3.10/site-packages/openai/_client.py", line 105, in __init__
raise OpenAIError(
openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/zkachwal/miniconda3/envs/moderation/lib/python3.10/site-packages/litellm/main.py", line 1346, in completion
_response = openai_text_completions.completion(
File "/data/zkachwal/miniconda3/envs/moderation/lib/python3.10/site-packages/litellm/llms/OpenAI/openai.py", line 1660, in completion
raise OpenAIError(
litellm.llms.OpenAI.openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable
Hi @Zoher15 , I believe configuring your HuggingFace API Token via huggingface-cli login
or export HUGGINGFACEHUB_API_TOKEN=your_api_token
resolves this. lmk if that doesn't work
I did the login. The only way I resolved it was setting the OpenAI token.......
ah i see. let me update that in the docs. to clarify, you just needed to set the api_key variable but can pass in an empty string right?
Yes you are right.
Overall, SGLang seems fast, but I have to figure out a lot about it to get to run the way HF was running before. I don't know what is the fp of the weights it loads. Even in 'text' mode it is using the user-assistant template I would like to get rid of. The transition is not as easy as just following three steps.
@Zoher15 Even in the text mode it's using the user-assistant template? That sounds different from what I'd like. Can you share more about how you identified this?
I used fewshot, loaded some hand created examples. And this is the template in text mode (model history). I am assuming this is how OpenAI's API processes it, so SGLANG is re-using it for Huggingface----incorrectly assuming all hugging face models are instruction tuned with the same template:
User message:
[[ ## question ## ]]
Is the following sentence plausible? "Steven Stamkos hit the slant pass."
Respond with the corresponding output fields, starting with the field `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.
Assistant message:
[[ ## answer ## ]]
No
[[ ## completed ## ]]
User message:
[[ ## question ## ]]
Is the following sentence plausible? "Carlos Correa threw to first base"
Respond with the corresponding output fields, starting with the field `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.
Response:
[[ ## answer ## ]]
Yes
[[ ## completed ## ]]
@Zoher15 Not necessarily, this is just how DSPy's inspect_history prints things.
If you pass model_type="text"
the model gets one string that concatenates the contents of the "messages" above into one blurb.
That said, I see a few action items here:
- Handling model_type may need to happen at the Adapter level, perhaps in BaseAdapter.
- Inspect history needs to be aware of that, so it shows things in a way that doesn't confuse users.
@okhat Can I just implement a custom subclass of LM to use with huggingface models locally? I understand and agree with your stance on non-server-client architectures as an ideal, but for quick exploration and restricted environments it's a good option to have.
Also, is there another link to the tutorial for SGLang integration? the previous one in this thread no longer exists.