hiyouga/LLaMA-Factory

How to estimate total steps and set proper

zhaoxu98 opened this issue · 0 comments

Reminder

  • I have read the README and searched the existing issues.

Reproduction

Hi, thank you for your excellent work.

I am somewhat confused about setting the appropriate hyperparameters for LoRA or Full SFT. For instance, how can I estimate the total number of training steps based on the size of the dataset N?

Additionally, how should I determine the appropriate values for gradient_accumulation_steps, warmup_steps, save_steps, and eval_steps in relation to the total steps? Is there any documentation that outlines strategies for setting these hyperparameters effectively? While the tutorial on Zhihu provides a good quickstart, I am looking for more detailed documentation. Could you point me in the right direction?

    --preprocessing_num_workers 32 \
    --per_device_train_batch_size 32 \
    --per_device_eval_batch_size 32 \
    --gradient_accumulation_steps 2 \
    --lr_scheduler_type cosine \
    --logging_steps 20 \
    --warmup_steps 40 \
    --save_steps 20 \
    --eval_steps 20 \
    --num_train_epochs 5.0 \

Expected behavior

No response

System Info

No response

Others

No response