How to estimate total steps and set proper
zhaoxu98 opened this issue · 0 comments
zhaoxu98 commented
Reminder
- I have read the README and searched the existing issues.
Reproduction
Hi, thank you for your excellent work.
I am somewhat confused about setting the appropriate hyperparameters for LoRA or Full SFT. For instance, how can I estimate the total number of training steps based on the size of the dataset N
?
Additionally, how should I determine the appropriate values for gradient_accumulation_steps
, warmup_steps
, save_steps
, and eval_steps
in relation to the total steps? Is there any documentation that outlines strategies for setting these hyperparameters effectively? While the tutorial on Zhihu provides a good quickstart, I am looking for more detailed documentation. Could you point me in the right direction?
--preprocessing_num_workers 32 \
--per_device_train_batch_size 32 \
--per_device_eval_batch_size 32 \
--gradient_accumulation_steps 2 \
--lr_scheduler_type cosine \
--logging_steps 20 \
--warmup_steps 40 \
--save_steps 20 \
--eval_steps 20 \
--num_train_epochs 5.0 \
Expected behavior
No response
System Info
No response
Others
No response