URGENT: Unable to Train using Modal
Closed this issue · 1 comments
vprateek1729 commented
I have logged into huggingface (as you can see) and changed the line modal.Secret.from_name("huggingface")
to modal.Secret.from_name("modal-huggingface-testing")
in src/common.py. I also have access to Llama-3 on HuggingFace.
This the error I'm getting when I'm trying to train the model using the code base in the repo. It says wrong username/password while I have clearly alreaddy logged into HuggingFace on cli (see "whoami" command output).
I'm fine-tuning LLMs for a project with an urgent deadline. Kindly reply/resolve this asap.
_| _| _| _| _|_|_| _|_|_| _|_|_| _| _| _|_|_| _|_|_|_| _|_| _|_|_| _|_|_|_|
_| _| _| _| _| _| _| _|_| _| _| _| _| _| _| _|
_|_|_|_| _| _| _| _|_| _| _|_| _| _| _| _| _| _|_| _|_|_| _|_|_|_| _| _|_|_|
_| _| _| _| _| _| _| _| _| _| _|_| _| _| _| _| _| _| _|
_| _| _|_| _|_|_| _|_|_| _|_|_| _| _| _|_|_| _| _| _| _|_|_| _|_|_|_|
A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
Setting a new token will erase the existing one.
To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible):
Add token as git credential? (Y/n) Y
Token is valid (permission: write).
Your token has been saved in your configured git credential helpers (osxkeychain).
Your token has been saved to /Users/vprateek/.cache/huggingface/token
Login successful
(py3.11) vprateek@Prateeks-MacBook-Pro llm-finetuning-main % huggingface-cli whoami
vprateek
(py3.11) vprateek@Prateeks-MacBook-Pro llm-finetuning-main % modal run --detach src.train --config=config/llama-3.yml --data=data/sqlqa.subsample.jsonl
Note that running a local entrypoint in detached mode only keeps the last triggered Modal function alive after the parent process has been killed or disconnected.
✓ Initialized. View run at https://modal.com/prateek/main/apps/ap-9PGC1Y0MgjIoY2x1FKrQ61
✓ Created objects.
├── 🔨 Created mount PythonPackage:src.train
├── 🔨 Created mount PythonPackage:src.inference
├── 🔨 Created mount PythonPackage:src
├── 🔨 Created function Inference.*.
├── 🔨 Created function train.
├── 🔨 Created function preproc_data.
├── 🔨 Created function merge.
├── 🔨 Created function launch.
├── 🔨 Created function Inference.completion.
├── 🔨 Created function Inference.non_streaming.
└── 🔨 Created web function Inference.web => https://pateek--example-axolotl-inference-web-dev.modal.run
Volume contains NousResearch/Meta-Llama-3-8B.
Preparing training run in /runs/axo-2024-07-15-21-54-24-20e0.
Spawning container for data preprocessing.
Preprocessing data.
WARNING: BNB_CUDA_VERSION=121 environment variable detected; loading libbitsandbytes_cuda121.so.
This can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
[2024-07-15 21:54:39,294] [INFO] [datasets.<module>:58] [PID:4] PyTorch version 2.3.0+cu121 available.
[2024-07-15 21:54:41,257] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
df: /root/.triton/autotune: No such file or directory
[2024-07-15 21:54:41,454] [INFO] [root.spawn:38] [PID:4] gcc -pthread -B /root/miniconda3/envs/py3.11/compiler_compat -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /root/miniconda3/envs/py3.11/include -fPIC -O2 -isystem /root/miniconda3/envs/py3.11/include -g0 -fPIC -g0 -c /tmp/tmpj8_f3kta/test.c -o /tmp/tmpj8_f3kta/test.o
[2024-07-15 21:54:41,514] [INFO] [root.spawn:38] [PID:4] gcc -pthread -B /root/miniconda3/envs/py3.11/compiler_compat /tmp/tmpj8_f3kta/test.o -laio -o /tmp/tmpj8_f3kta/a.out
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING] using untested triton version (2.3.0), only 1.0.0 is known to be compatible
dP dP dP
88 88 88
.d8888b. dP. .dP .d8888b. 88 .d8888b. d8888P 88
88' `88 `8bd8' 88' `88 88 88' `88 88 88
88. .88 .d88b. 88. .88 88 88. .88 88 88
`88888P8 dP' `dP `88888P' dP `88888P' dP dP
****************************************
**** Axolotl Dependency Versions *****
accelerate: 0.30.1
peft: 0.11.1
transformers: 4.42.3
trl: 0.8.7.dev0
torch: 2.3.0+cu121
bitsandbytes: 0.43.1
****************************************
[2024-07-15 21:54:44,657] [DEBUG] [axolotl.normalize_config:80] [PID:4] [RANK:0] bf16 support detected, enabling for this configuration.
[2024-07-15 21:54:44,888] [INFO] [axolotl.normalize_config:183] [PID:4] [RANK:0] GPU memory usage baseline: 0.000GB (+0.307GB misc)
Traceback (most recent call last):
File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
response.raise_for_status()
File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/requests/models.py", line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/api/whoami-v2
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/hf_api.py", line 1397, in whoami
hf_raise_for_status(r)
File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py", line 371, in hf_raise_for_status
raise HfHubHTTPError(str(e), response=response) from e
huggingface_hub.utils._errors.HfHubHTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/api/whoami-v2 (Request ID: Root=1-66959aa4-25141c8c4d4de1d64754f466;dc7bd8b2-5b73-4e49-ad22-6ccfa0fb67b7)
Invalid username or password.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/workspace/axolotl/src/axolotl/cli/preprocess.py", line 91, in <module>
fire.Fire(do_cli)
File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/fire/core.py", line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/axolotl/src/axolotl/cli/preprocess.py", line 39, in do_cli
check_user_token()
File "/workspace/axolotl/src/axolotl/cli/__init__.py", line 484, in check_user_token
user_info = api.whoami()
^^^^^^^^^^^^
File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/hf_api.py", line 1399, in whoami
raise HTTPError(
requests.exceptions.HTTPError: Invalid user token. If you didn't pass a user token, make sure you are properly logged in by executing `huggingface-cli login`, and if you did pass a user token, double-check it's correct.
Traceback (most recent call last):
File "/pkg/modal/_container_io_manager.py", line 503, in handle_input_exception
yield
File "/pkg/modal/_container_entrypoint.py", line 383, in run_input_sync
res = finalized_function.callable(*local_input.args, **local_input.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/src/train.py", line 55, in preproc_data
run_cmd(
File "/root/src/train.py", line 186, in run_cmd
exit(exit_code)
File "<frozen _sitebuiltins>", line 26, in __call__
SystemExit: 1
Traceback (most recent call last):
File "/pkg/modal/_container_io_manager.py", line 503, in handle_input_exception
yield
File "/pkg/modal/_container_entrypoint.py", line 383, in run_input_sync
res = finalized_function.callable(*local_input.args, **local_input.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/src/train.py", line 131, in launch
preproc_handle.get()
File "/pkg/synchronicity/synchronizer.py", line 531, in proxy_method
return wrapped_method(instance, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/pkg/synchronicity/combined_types.py", line 28, in __call__
raise uc_exc.exc from None
File "<ta-01J2W67HKMPYXZ2NCR3X9E61N4>:/root/src/train.py", line 55, in preproc_data
File "<ta-01J2W67HKMPYXZ2NCR3X9E61N4>:/root/src/train.py", line 186, in run_cmd
File "<frozen _sitebuiltins>", line 26, in __call__
SystemExit: 1```
vprateek1729 commented
I was missing the HF_Key on Modal. Perhaps, consider reordering the instructions in the documentation to first retrieve the HuggingFace Key and then creating a secret on Modal.