Exception when "--checkpointing_steps" is set
Hypothesis-Z opened this issue · 2 comments
Hypothesis-Z commented
Source code in Accelerate
lib shows that weights
in hooks is empty if the training task is launched via Deepspeed.
Threrfore, IndexError will be raised in save_model_hook
.
e5-mistral-7b-instruct/peft_lora_embedding_semantic_search.py
Lines 158 to 162 in 9902191
Another error is that if "--checkpointing_steps" is set as "epoch", acceleator.save_state()
times out but it works if an integer is set.
liujiqiang999 commented
Hi, Have you solved this problem?
Hypothesis-Z commented
@liujiqiang999 Do not register the hooks.
# accelerator.register_save_state_pre_hook(save_model_hook)
# accelerator.register_load_state_pre_hook(load_model_hook)