hiyouga/LLaMA-Factory

请教全量微调时deepspeed保存事项

SandroChen opened this issue · 3 comments

Reminder

  • I have read the README and searched the existing issues.

Reproduction

├── added_tokens.json
├── config.json
├── generation_config.json
├── global_step200
│ ├── zero_pp_rank_0_mp_rank_00_model_states.pt
│ ├── zero_pp_rank_0_mp_rank_00_optim_states.pt
│ ├── zero_pp_rank_1_mp_rank_00_model_states.pt
│ ├── zero_pp_rank_1_mp_rank_00_optim_states.pt
│ ├── zero_pp_rank_2_mp_rank_00_model_states.pt
│ ├── zero_pp_rank_2_mp_rank_00_optim_states.pt
│ ├── zero_pp_rank_3_mp_rank_00_model_states.pt
│ └── zero_pp_rank_3_mp_rank_00_optim_states.pt
├── latest
├── merges.txt
├── model.safetensors
├── rng_state_0.pth
├── rng_state_1.pth
├── rng_state_2.pth
├── rng_state_3.pth
├── scheduler.pt
├── special_tokens_map.json
├── tokenizer_config.json
├── tokenizer.json
├── trainer_state.json
├── training_args.bin
├── vocab.json
└── zero_to_fp32.py

如上,目录中的global_step文件夹中会保存占用空间很大的文件,想请问一下这是deepspeed保存的模型梯度和优化器吗?配置文件中有参数可以取消这个保存功能吗?我的磁盘空间很小,且不需要做继续训练,应该不需要这些文件。

Expected behavior

No response

System Info

No response

Others

No response

save_only_model: true

save_only_model: true

是在训练的bash脚本里加吗?