huggingface/huggingface_hub

upload_large_folder() issue with uploading to new spaces (create_repo requires space_sdk)

John6666cat opened this issue ยท 4 comments

Describe the bug

from HF Forum:
https://discuss.huggingface.co/t/upload-large-folder-issue-with-uploading-to-spaces/129326

If you run upload_large_folder() with a non-existent space, you will get an error from create_repo(). This is because create_repo()'s space_sdk is optional, but if it is not there, an error will occur.

https://github.com/huggingface/huggingface_hub/blob/v0.26.3/src/huggingface_hub/_upload_large_folder.py#L92

Reproduction

HF_TOKEN = "hf_*********"
from huggingface_hub import HfApi
api = HfApi(token=HF_TOKEN)
api.upload_large_folder("John6666/lftest", folder_path="test_folder", repo_type="space", private=True)

Logs

File "w:\TEMP\test\upload_large_folder_test.py", line 4, in <module>
    api.upload_large_folder("John6666/lftest", folder_path="test_folder", repo_type="space", private=True)
  File "c:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\huggingface_hub\hf_api.py", line 5473, in upload_large_folder
    return upload_large_folder_internal(
  File "c:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\huggingface_hub\_upload_large_folder.py", line 92, in upload_large_folder_internal
    repo_url = api.create_repo(repo_id=repo_id, repo_type=repo_type, private=private, exist_ok=True)
  File "c:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "c:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\huggingface_hub\hf_api.py", line 3479, in create_repo
    raise ValueError(
ValueError: No space_sdk provided. `create_repo` expects space_sdk to be one of ['gradio', 'streamlit', 'docker', 'static'] when repo_type is 'space'`

System info

- huggingface_hub version: 0.26.2
- Platform: Windows-10-10.0.19045-SP0
- Python version: 3.9.13
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Running in Google Colab Enterprise ?: No
- Token path ?: w:\hf\misc\token
- Has saved token ?: False
- Configured git credential helpers: manager
- FastAI: N/A
- Tensorflow: N/A
- Torch: 2.4.0+cu124
- Jinja2: 3.1.4
- Graphviz: N/A
- keras: N/A
- Pydot: N/A
- Pillow: 11.0.0
- hf_transfer: N/A
- gradio: 4.44.1
- tensorboard: 2.6.2.2
- numpy: 1.23.5
- pydantic: 2.8.2
- aiohttp: 3.9.5
- ENDPOINT: https://huggingface.co
- HF_HUB_CACHE: w:\hf\misc\hub
- HF_ASSETS_CACHE: w:\hf\misc\assets
- HF_TOKEN_PATH: w:\hf\misc\token
- HF_STORED_TOKENS_PATH: w:\hf\misc\stored_tokens
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
- HF_HUB_ETAG_TIMEOUT: 10
- HF_HUB_DOWNLOAD_TIMEOUT: 10

Thanks for x-posting :) What happens if the repo already exists?

I wonder if the HF forum post is being treated as an issue.๐Ÿ™„
Well, that's fine, but no way... I got an error...???

Edit:
When all of the following processes have been completed, the space has been successfully created, and only the README.md for Gradio is present, and the folder has not been uploaded.

api.create_repo(repo_id="John6666/lftest", repo_type="space", private=True, space_sdk="gradio")
api.upload_large_folder("John6666/lftest", folder_path="test_folder", repo_type="space", private=True)
Traceback (most recent call last):
  File "w:\TEMP\test\upload_large_folder_test.py", line 4, in <module>
    api.create_repo(repo_id="John6666/lftest", repo_type="space", private=True, space_sdk="gradio")
  File "c:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "c:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\huggingface_hub\hf_api.py", line 3531, in create_repo
    hf_raise_for_status(r)
  File "c:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\huggingface_hub\utils\_http.py", line 477, in hf_raise_for_status
    raise _format(HfHubHTTPError, str(e), response) from e
huggingface_hub.errors.HfHubHTTPError: 409 Client Error: Conflict for url: https://huggingface.co/api/repos/create (Request ID: Root=1-6751a69c-1508c4e73af0bc547915347a;db8a6076-1364-4e28-b01f-c0ac387d2617)

Again.

#api.create_repo(repo_id="John6666/lftest", repo_type="space", private=True, space_sdk="gradio")
api.upload_large_folder("John6666/lftest", folder_path="test_folder", repo_type="space", private=True)
  File "w:\TEMP\test\upload_large_folder_test.py", line 5, in <module>
    api.upload_large_folder("John6666/lftest", folder_path="test_folder", repo_type="space", private=True)
  File "c:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\huggingface_hub\hf_api.py", line 5473, in upload_large_folder
    return upload_large_folder_internal(
  File "c:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\huggingface_hub\_upload_large_folder.py", line 92, in upload_large_folder_internal
    repo_url = api.create_repo(repo_id=repo_id, repo_type=repo_type, private=private, exist_ok=True)
  File "c:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "c:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\huggingface_hub\hf_api.py", line 3479, in create_repo
    raise ValueError(
ValueError: No space_sdk provided. `create_repo` expects space_sdk to be one of ['gradio', 'streamlit', 'docker', 'static'] when repo_type is 'space'`

Thanks for testing. Actually with @hanouticelina we were discussing about removing the possibility to push using upload_large_folder to a Space repo. It "solves" the issue and since Spaces are not meant to store huge amount of data (they are just code repos), then it's best not to support it.

Thank you! Roger that. Now there's a bug that causes builds to get stuck if there are 5GB files in Spaces, so that's reasonable.๐Ÿ˜…
And I think model repo is faster. In feeling.