modal-labs/modal-examples

Mixtral tutorial doesn't work without huggingface access token

Closed this issue · 1 comments

The tutorial for running mixtral on VLLM doesn't work since the model cannot be downloaded without a huggingface access token. This is because mixtral is now a gated model: https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1

Image build for im-1P0Aou6cl9H3BAwictbALw failed with the exception:                                           │
│ GatedRepoError('401 Client Error. (Request ID:                                                                 │
│ Root=1-66213747-475d6ad5261bb9eb4931c4fd;025f8bf1-0bb2-42ac-86a0-743e752004a0)\n\nCannot access gated repo for │
│ url https://huggingface.co/api/models/mistralai/Mixtral-8x7B-Instruct-v0.1/revision/main.\nRepo model          │
│ mistralai/Mixtral-8x7B-Instruct-v0.1 is gated. You must be authenticated to access it.')

Affected Tutorial: https://modal.com/docs/examples/vllm_mixtral
Affected Code: https://github.com/modal-labs/modal-examples/blob/main/06_gpu_and_ml/llm-serving/vllm_mixtral.py

This can be fixed using a similar approach as done here (adding an env var for a HF_TOKEN) to the function call where the model is downloaded.
Also the tutorial needs to be updated to inform the user that a huggingface access token is required.

Thanks for the thorough report! We just picked up this failure in our monitoring system, and it's great to have this issue to confirm it and suggest the fix.