How do I load huggingface models?

Question

How do I load huggingface models?

Opened this issue 23 days ago · 16 comments

Hi,

There seem to be some big changes and I cannot find a single example that tells me how to load huggingface models that I was using with HF.model before. Also the dspy AI tool is broken and no longer able to help.

Best,
Zoher

Answer 1 · 2024-10-30T02:58:18.000Z

Hey @Zoher15 , you should install SGLang (if you have a GPU) or Ollama (if you don't have a GPU).

Follow the instructions here: https://dspy-docs.vercel.app/building-blocks/1-language_models/?h=using+locally+hosted+lms#using-locally-hosted-lms

Answer 2 · 2024-10-30T03:03:54.000Z

Hi @okhat,

I did go through that example. I was not aware of SGLang, so it seems like to use hugging face models on my GPU, I would need to figure out SGLang first? Is there some advantage to SGLang over HF that I'm missing?

Best,
Zoher

Answer 3 · 2024-10-30T04:22:32.000Z

Yes, you need a server-client architecture with good batching to get acceptable speed with local models. Otherwise evaluation and optimization will have to be single threaded and hence extremely slow.

Answer 4 · 2024-10-30T04:23:01.000Z

You don’t need to figure out anything per se. Just follow the 3-4 instructions there and let me know if you face any issues.

Answer 5 · 2024-10-31T19:25:44.000Z

@okhat so are local non server-client HF models no longer going to be supported at all going forward?

Answer 6 · 2024-11-01T11:13:40.000Z

@dzimmerman-nci We will experiment with things like SGLang's Engine which is non-server client. But standard HF Transformers without additional batching or serving infrastructure are not appropriate for DSPy, or really for any library targeted at using LMs at inference time.

Answer 7 · 2024-11-01T23:04:37.000Z

Hey @Zoher15 , you should install SGLang (if you have a GPU) or Ollama (if you don't have a GPU).

Follow the instructions here: https://dspy-docs.vercel.app/building-blocks/1-language_models/?h=using+locally+hosted+lms#using-locally-hosted-lms

So I followed the steps and for:

sglang_port = 7501
sglang_url = f"http://localhost:{sglang_port}/v1"
model = dspy.LM("openai/meta-llama/Meta-Llama-3-8B-Instruct", api_base=sglang_url, model_type='text')
dspy.configure(lm=model)

I receive an error for the OpenAI API key, is this supposed to happen? The model is up and running.....

LiteLLM.Info: If you need to debug this error, use `litellm.set_verbose=True'.

Traceback (most recent call last):
  File "/data/zkachwal/miniconda3/envs/moderation/lib/python3.10/site-packages/litellm/llms/OpenAI/openai.py", line 1625, in completion
    openai_client = OpenAI(
  File "/data/zkachwal/miniconda3/envs/moderation/lib/python3.10/site-packages/openai/_client.py", line 105, in __init__
    raise OpenAIError(
openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/zkachwal/miniconda3/envs/moderation/lib/python3.10/site-packages/litellm/main.py", line 1346, in completion
    _response = openai_text_completions.completion(
  File "/data/zkachwal/miniconda3/envs/moderation/lib/python3.10/site-packages/litellm/llms/OpenAI/openai.py", line 1660, in completion
    raise OpenAIError(
litellm.llms.OpenAI.openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

Answer 8 · 2024-11-01T23:18:05.000Z

Hi @Zoher15 , I believe configuring your HuggingFace API Token via huggingface-cli login or export HUGGINGFACEHUB_API_TOKEN=your_api_token resolves this. lmk if that doesn't work

Answer 9 · 2024-11-01T23:19:15.000Z

I did the login. The only way I resolved it was setting the OpenAI token.......

Answer 10 · 2024-11-01T23:22:30.000Z

ah i see. let me update that in the docs. to clarify, you just needed to set the api_key variable but can pass in an empty string right?

Answer 11 · 2024-11-01T23:28:28.000Z

Yes you are right.

Answer 12 · 2024-11-01T23:32:36.000Z

Overall, SGLang seems fast, but I have to figure out a lot about it to get to run the way HF was running before. I don't know what is the fp of the weights it loads. Even in 'text' mode it is using the user-assistant template I would like to get rid of. The transition is not as easy as just following three steps.

Answer 13 · 2024-11-04T20:35:01.000Z

@Zoher15 Even in the text mode it's using the user-assistant template? That sounds different from what I'd like. Can you share more about how you identified this?

Answer 14 · 2024-11-04T21:47:57.000Z

I used fewshot, loaded some hand created examples. And this is the template in text mode (model history). I am assuming this is how OpenAI's API processes it, so SGLANG is re-using it for Huggingface----incorrectly assuming all hugging face models are instruction tuned with the same template:

User message:

[[ ## question ## ]]
Is the following sentence plausible? "Steven Stamkos hit the slant pass."

Respond with the corresponding output fields, starting with the field `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.


Assistant message:

[[ ## answer ## ]]
No

[[ ## completed ## ]]


User message:

[[ ## question ## ]]
Is the following sentence plausible? "Carlos Correa threw to first base"

Respond with the corresponding output fields, starting with the field `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.


Response:

[[ ## answer ## ]]
Yes

[[ ## completed ## ]]

Answer 15 · 2024-11-05T13:22:18.000Z

@Zoher15 Not necessarily, this is just how DSPy's inspect_history prints things.

If you pass model_type="text" the model gets one string that concatenates the contents of the "messages" above into one blurb.

That said, I see a few action items here:

Handling model_type may need to happen at the Adapter level, perhaps in BaseAdapter.
Inspect history needs to be aware of that, so it shows things in a way that doesn't confuse users.

Answer 16 · 2024-11-20T16:33:20.000Z

@okhat Can I just implement a custom subclass of LM to use with huggingface models locally? I understand and agree with your stance on non-server-client architectures as an ideal, but for quick exploration and restricted environments it's a good option to have.

Also, is there another link to the tutorial for SGLang integration? the previous one in this thread no longer exists.