stanfordnlp/dspy

How do I load huggingface models?

Opened this issue ยท 16 comments

Hi,

There seem to be some big changes and I cannot find a single example that tells me how to load huggingface models that I was using with HF.model before. Also the dspy AI tool is broken and no longer able to help.

Best,
Zoher

okhat commented

Hey @Zoher15 , you should install SGLang (if you have a GPU) or Ollama (if you don't have a GPU).

Follow the instructions here: https://dspy-docs.vercel.app/building-blocks/1-language_models/?h=using+locally+hosted+lms#using-locally-hosted-lms

Hi @okhat,

I did go through that example. I was not aware of SGLang, so it seems like to use hugging face models on my GPU, I would need to figure out SGLang first? Is there some advantage to SGLang over HF that I'm missing?

Best,
Zoher

okhat commented

Yes, you need a server-client architecture with good batching to get acceptable speed with local models. Otherwise evaluation and optimization will have to be single threaded and hence extremely slow.

okhat commented

You donโ€™t need to figure out anything per se. Just follow the 3-4 instructions there and let me know if you face any issues.

@okhat so are local non server-client HF models no longer going to be supported at all going forward?

okhat commented

@dzimmerman-nci We will experiment with things like SGLang's Engine which is non-server client. But standard HF Transformers without additional batching or serving infrastructure are not appropriate for DSPy, or really for any library targeted at using LMs at inference time.

Hey @Zoher15 , you should install SGLang (if you have a GPU) or Ollama (if you don't have a GPU).

Follow the instructions here: https://dspy-docs.vercel.app/building-blocks/1-language_models/?h=using+locally+hosted+lms#using-locally-hosted-lms

So I followed the steps and for:

sglang_port = 7501
sglang_url = f"http://localhost:{sglang_port}/v1"
model = dspy.LM("openai/meta-llama/Meta-Llama-3-8B-Instruct", api_base=sglang_url, model_type='text')
dspy.configure(lm=model)

I receive an error for the OpenAI API key, is this supposed to happen? The model is up and running.....

LiteLLM.Info: If you need to debug this error, use `litellm.set_verbose=True'.

Traceback (most recent call last):
  File "/data/zkachwal/miniconda3/envs/moderation/lib/python3.10/site-packages/litellm/llms/OpenAI/openai.py", line 1625, in completion
    openai_client = OpenAI(
  File "/data/zkachwal/miniconda3/envs/moderation/lib/python3.10/site-packages/openai/_client.py", line 105, in __init__
    raise OpenAIError(
openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/zkachwal/miniconda3/envs/moderation/lib/python3.10/site-packages/litellm/main.py", line 1346, in completion
    _response = openai_text_completions.completion(
  File "/data/zkachwal/miniconda3/envs/moderation/lib/python3.10/site-packages/litellm/llms/OpenAI/openai.py", line 1660, in completion
    raise OpenAIError(
litellm.llms.OpenAI.openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

Hi @Zoher15 , I believe configuring your HuggingFace API Token via huggingface-cli login or export HUGGINGFACEHUB_API_TOKEN=your_api_token resolves this. lmk if that doesn't work

I did the login. The only way I resolved it was setting the OpenAI token.......

ah i see. let me update that in the docs. to clarify, you just needed to set the api_key variable but can pass in an empty string right?

Yes you are right.

Overall, SGLang seems fast, but I have to figure out a lot about it to get to run the way HF was running before. I don't know what is the fp of the weights it loads. Even in 'text' mode it is using the user-assistant template I would like to get rid of. The transition is not as easy as just following three steps.

okhat commented

@Zoher15 Even in the text mode it's using the user-assistant template? That sounds different from what I'd like. Can you share more about how you identified this?

I used fewshot, loaded some hand created examples. And this is the template in text mode (model history). I am assuming this is how OpenAI's API processes it, so SGLANG is re-using it for Huggingface----incorrectly assuming all hugging face models are instruction tuned with the same template:

User message:

[[ ## question ## ]]
Is the following sentence plausible? "Steven Stamkos hit the slant pass."

Respond with the corresponding output fields, starting with the field `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.


Assistant message:

[[ ## answer ## ]]
No

[[ ## completed ## ]]


User message:

[[ ## question ## ]]
Is the following sentence plausible? "Carlos Correa threw to first base"

Respond with the corresponding output fields, starting with the field `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.


Response:

[[ ## answer ## ]]
Yes

[[ ## completed ## ]]
okhat commented

@Zoher15 Not necessarily, this is just how DSPy's inspect_history prints things.

If you pass model_type="text" the model gets one string that concatenates the contents of the "messages" above into one blurb.

That said, I see a few action items here:

  • Handling model_type may need to happen at the Adapter level, perhaps in BaseAdapter.
  • Inspect history needs to be aware of that, so it shows things in a way that doesn't confuse users.

@okhat Can I just implement a custom subclass of LM to use with huggingface models locally? I understand and agree with your stance on non-server-client architectures as an ideal, but for quick exploration and restricted environments it's a good option to have.

Also, is there another link to the tutorial for SGLang integration? the previous one in this thread no longer exists.