Lock acquisiton fails on download
AlpinDale opened this issue · 3 comments
Describe the bug
I've been trying to download NousResearch/Meta-Llama-3.1-8B-Instruct
with and without hf-transfer
, but it consistently hangs at the 10GB point (2 shards with hf-transfer, half of each without), with this message being repeated every few seconds:
still waiting to acquire lock on /home/austin/.cache/huggingface/hub/.locks/models--NousResearch--Meta-Llama-3.1-8B-Instruct/fc1cdddd6bfa91128d6e94ee73d0ce62bfcdb7af29e978ddcab30c66ae9ea7fa.lock
Reproduction
pip install -U huggingface-hub[cli] hf-transfer
HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download NousResearch/Meta-Llama-3.1-8B-Instruct --exclude *.pth
Logs
$ huggingface-cli download NousResearch/Meta-Llama-3.1-8B-Instruct --exclude *.pth
Downloading '.gitattributes' to '/home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/a6344aac8c09253b3b630fb776ae94478aa0275b.incomplete'
.gitattributes: 100%|███████████████████████████████████████████████████████████████████████████| 1.52k/1.52k [00:00<00:00, 12.4MB/s]
Download complete. Moving file to /home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/a6344aac8c09253b3b630fb776ae94478aa0275b
Downloading 'LICENSE' to '/home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/a7c3ca16cee30425ed6ad841a809590f2bcbf290.incomplete'
LICENSE: 100%|██████████████████████████████████████████████████████████████████████████████████| 7.63k/7.63k [00:00<00:00, 22.6MB/s]
Download complete. Moving file to /home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/a7c3ca16cee30425ed6ad841a809590f2bcbf290
Downloading 'README.md' to '/home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/71ce2f59177b48e3da2ac1b559393f4fcd9b3ea1.incomplete'
README.md: 100%|████████████████████████████████████████████████████████████████████████████████| 41.8k/41.8k [00:00<00:00, 63.5MB/s]
Download complete. Moving file to /home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/71ce2f59177b48e3da2ac1b559393f4fcd9b3ea1
Downloading 'USE_POLICY.md' to '/home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/81ebb55902285e8dd5804ccf423d17ffb2a622ee.incomplete'
USE_POLICY.md: 100%|████████████████████████████████████████████████████████████████████████████| 4.69k/4.69k [00:00<00:00, 12.7MB/s]
Download complete. Moving file to /home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/81ebb55902285e8dd5804ccf423d17ffb2a622ee
Downloading 'config.json' to '/home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/0bb6fd75b3ad2fe988565929f329945262c2814e.incomplete'
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████| 855/855 [00:00<00:00, 3.32MB/s]
Download complete. Moving file to /home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/0bb6fd75b3ad2fe988565929f329945262c2814e
Downloading 'generation_config.json' to '/home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/cc7276afd599de091142c6ed3005faf8a74aa257.incomplete'
generation_config.json: 100%|████████████████████████████████████████████████████████████████████████| 184/184 [00:00<00:00, 735kB/s]
Download complete. Moving file to /home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/cc7276afd599de091142c6ed3005faf8a74aa257
Downloading 'model-00001-of-00004.safetensors' to '/home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/2b1879f356aed350030bb40eb45ad362c89d9891096f79a3ab323d3ba5607668.incomplete'
model-00001-of-00004.safetensors: 100%|█████████████████████████████████████████████████████████▉| 4.98G/4.98G [00:11<00:00, 443MB/s]
Download complete. Moving file to /home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/2b1879f356aed350030bb40eb45ad362c89d9891096f79a3ab323d3ba5607668
Downloading 'model-00002-of-00004.safetensors' to '/home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/09d433f650646834a83c580877bd60c6d1f88f7755305c12576b5c7058f9af15.incomplete'
model-00002-of-00004.safetensors: 100%|█████████████████████████████████████████████████████████▉| 5.00G/5.00G [00:08<00:00, 600MB/s]
Download complete. Moving file to /home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/09d433f650646834a83c580877bd60c6d1f88f7755305c12576b5c7058f9af15
still waiting to acquire lock on /home/austin/.cache/huggingface/hub/.locks/models--NousResearch--Meta-Llama-3.1-8B-Instruct/fc1cdddd6bfa91128d6e94ee73d0ce62bfcdb7af29e978ddcab30c66ae9ea7fa.lock
still waiting to acquire lock on /home/austin/.cache/huggingface/hub/.locks/models--NousResearch--Meta-Llama-3.1-8B-Instruct/fc1cdddd6bfa91128d6e94ee73d0ce62bfcdb7af29e978ddcab30c66ae9ea7fa.lock
still waiting to acquire lock on /home/austin/.cache/huggingface/hub/.locks/models--NousResearch--Meta-Llama-3.1-8B-Instruct/fc1cdddd6bfa91128d6e94ee73d0ce62bfcdb7af29e978ddcab30c66ae9ea7fa.lock
still waiting to acquire lock on /home/austin/.cache/huggingface/hub/.locks/models--NousResearch--Meta-Llama-3.1-8B-Instruct/fc1cdddd6bfa91128d6e94ee73d0ce62bfcdb7af29e978ddcab30c66ae9ea7fa.lock
still waiting to acquire lock on /home/austin/.cache/huggingface/hub/.locks/models--NousResearch--Meta-Llama-3.1-8B-Instruct/fc1cdddd6bfa91128d6e94ee73d0ce62bfcdb7af29e978ddcab30c66ae9ea7fa.lock
still waiting to acquire lock on /home/austin/.cache/huggingface/hub/.locks/models--NousResearch--Meta-Llama-3.1-8B-Instruct/fc1cdddd6bfa91128d6e94ee73d0ce62bfcdb7af29e978ddcab30c66ae9ea7fa.lock
still waiting to acquire lock on /home/austin/.cache/huggingface/hub/.locks/models--NousResearch--Meta-Llama-3.1-8B-Instruct/fc1cdddd6bfa91128d6e94ee73d0ce62bfcdb7af29e978ddcab30c66ae9ea7fa.lock
still waiting to acquire lock on /home/austin/.cache/huggingface/hub/.locks/models--NousResearch--Meta-Llama-3.1-8B-Instruct/fc1cdddd6bfa91128d6e94ee73d0ce62bfcdb7af29e978ddcab30c66ae9ea7fa.lock
^CTraceback (most recent call last):
File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/bin/huggingface-cli", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/huggingface_hub/commands/huggingface_cli.py", line 52, in main
service.run()
File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/huggingface_hub/commands/download.py", line 146, in run
print(self._download()) # Print path to downloaded files
^^^^^^^^^^^^^^^^
File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/huggingface_hub/commands/download.py", line 180, in _download
return snapshot_download(
^^^^^^^^^^^^^^^^^^
File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/huggingface_hub/_snapshot_download.py", line 297, in snapshot_download
_inner_hf_hub_download(file)
File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/huggingface_hub/_snapshot_download.py", line 273, in _inner_hf_hub_download
return hf_hub_download(
^^^^^^^^^^^^^^^^
File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/huggingface_hub/utils/_deprecation.py", line 101, in inner_f
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1240, in hf_hub_download
return _hf_hub_download_to_cache_dir(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1388, in _hf_hub_download_to_cache_dir
with WeakFileLock(lock_path):
File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/contextlib.py", line 137, in __enter__
return next(self.gen)
^^^^^^^^^^^^^^
File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/huggingface_hub/utils/_fixes.py", line 91, in WeakFileLock
lock.acquire()
File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/filelock/_api.py", line 344, in acquire
time.sleep(poll_interval)
KeyboardInterrupt
System info
- huggingface_hub version: 0.24.7
- Platform: Linux-5.15.0-119-generic-x86_64-with-glibc2.35
- Python version: 3.11.9
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Token path ?: /home/austin/.cache/huggingface/token
- Has saved token ?: True
- Who am I ?: alpindale
- Configured git credential helpers: store
- FastAI: N/A
- Tensorflow: N/A
- Torch: 2.4.0
- Jinja2: 3.1.4
- Graphviz: N/A
- keras: N/A
- Pydot: N/A
- Pillow: 10.4.0
- hf_transfer: 0.1.8
- gradio: N/A
- tensorboard: N/A
- numpy: 1.26.4
- pydantic: 2.8.2
- aiohttp: 3.10.5
- ENDPOINT: https://huggingface.co
- HF_HUB_CACHE: /home/austin/.cache/huggingface/hub
- HF_ASSETS_CACHE: /home/austin/.cache/huggingface/assets
- HF_TOKEN_PATH: /home/austin/.cache/huggingface/token
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
- HF_HUB_ETAG_TIMEOUT: 10
- HF_HUB_DOWNLOAD_TIMEOUT: 10
Hi @AlpinDale, sorry for the inconvenience. What type of hard-drive is it? (quite classic or a special mounted drive?). Asking because filelock doesn't always work properly on some filesystems. Independently from that, you can try to kill all huggingface_hub/hf_transfer processes and then run rm -rf /home/austin/.cache/huggingface/hub/.locks
to delete all current locks. This should fix your issues ( 🤞 ), though I can't explain why it happened in the first place.
Same issue here. Tried to delete the .locks
but it unfortunately didn't help. Instead, reducing the --max-workers
to something like 2 worked.
EG:
huggingface-cli download stabilityai/stable-diffusion-3.5-medium --max-workers 2
This is without using hf_transfer, and for a different model. In my case this did not hinder performance, but I imagine that varies much on your network speed.
EDIT: Spoke too soon. Didn't solve however reduced the frequency at least.
@JakubCzarlinski sorry for the long delay. I suppose that setting --max-workers 1
would definitely settle the issue, though it's not a satisfying solution.
To help investigating this, could you share more details about your issue/setup? :
- is there anything specific to know about your harddrive? (mounted disk for instance?) are you running in a docker with volumes? etc.
- can you run in a script with debug logs and copy-paste the full output:
from huggingface_hub import logging, snapshot_download
logging.set_verbosity_debug()
snapshot_download("stabilityai/stable-diffusion-3.5-medium", force_download=True, max_workers=10)
(max workers 10 should trigger the error more easily)
Thanks in advance!