huggingface/huggingface_hub

make ignore_patterns="original" the default setting for download of model weights.

whitesscott opened this issue · 1 comments

I tried the following to not download the 20GB "original" directory but it was still downloaded.

from huggingface_hub import snapshot_download
model_id = "meta-llama/Llama-3.2-11B-Vision-Instruct"
snapshot_download(repo_id=model_id, repo_type="model", ignore_patterns="original")

There are many model repos on huggingface.co that have original directories with multi gigabyte files.
I think it would be better to have to opt in to download those often not needed files.

I looked through the huggingface_hub git repo and the huggingface.co online documentation and did not see a better method to avoid the extraneous file downloads. huggingface-cli --exclude "original" might have worked, but I don't have unlimited downloads to experiment with it.

In the alternative you could not include the "original" model weight directory in models that have been converted to HF.

Hi @whitesscott, snapshot_download and huggingface_hub in general is meant to by semi-agnostic of the repo structures. Adding such an implicit rule could break workflows for some users and create misleading behaviors. In your case, what you need to do is to provide a pattern to snapshot_download:

from huggingface_hub import snapshot_download
model_id = "meta-llama/Llama-3.2-11B-Vision-Instruct"
snapshot_download(repo_id=model_id, ignore_patterns="original/*")

Which will ignore the original/ folder.