Could you provide training code for StructLM? I would like to try training and generalizing to custom datasets. thank you.

Question

Could you provide training code for StructLM? I would like to try training and generalizing to custom datasets. thank you.

kanseaveg opened this issue 10 months ago · 4 comments

Answer 1 · 2024-03-17T20:38:27.000Z

Hi, the way that I recommend to train is by using https://github.com/hiyouga/LLaMA-Factory/ for sft. an example of settings you can use are below. The training data is at https://huggingface.co/datasets/TIGER-Lab/SKGInstruct. Our 13B and 34B models are trained using the SKGInstruct Dataset on Llama-Factory. For the 7B model, due to a lot of experimentation, the exact training code I used to produce it is more difficult to clean. I believe it is easier for you to direclty use the LLama Factory repo for the 7B as well, after all, our model should reproduce based on the data alone.

For reproducing our model, I believe the important hyperparameters to pay attention to are the cutoff len, lr scheduler, and effective batch size (512)

  --stage sft \
  --model_name_or_path $MODEL_PATH \
  --do_train \
  --flash_attn \
  --template llama2 \
  --use_fast_tokenizer False \
  --dataset ${DATASET_NAME} \
  --finetuning_type full \
  --output_dir ${OUTPUT_DIR}\
  --overwrite_cache \
  --overwrite_output_dir \
  --per_device_train_batch_size $BATCH_SIZE_PER_GPU \
  --gradient_accumulation_steps $GRADIENT_ACC_STEPS \
  --lr_scheduler_type cosine \
  --logging_steps 1 \
  --evaluation_strategy no \
  --save_strategy epoch \
  --learning_rate 2e-5 \
  --weight_decay 0.0 \
  --num_train_epochs 3.0 \
  --cutoff_len 4096 \
  --warmup_ratio 0.05 \
  --preprocessing_num_workers 64 \
  --plot_loss \
  --bf16

Answer 2 · 2024-03-20T03:33:35.000Z

Thanks for your patience. After reading the source code of uskg, I found that you seem to be implemented based on the native uskg framework and the corresponding huggingface trainer. Meanwhile, I saw a large number of experimental test scripts in the script list, at least thousands of experiments haha. Thank you HKUNLP for your contribution to the community

As for llama-factory, at the beginning, they did not support vllm and flash-attention. I reminded them in early March (At the time, the framework did not support vllm technique.) that they did not support the use of Llama-Factory for training until they added this feature point. Their framework is indeed quite superior. Different trainer and sft fine-tuning methods as well as key techniques have been integrated into it.

However, I would like to know whether you have tried using the parameters you provide and the Llama-Factory to train the StructLM? Is there any difference in score and performance compared to training StructLM based on the native trainer? Thank you for your patient answer and look forward to your reply again

Answer 3 · 2024-03-20T14:43:59.000Z

Hi, indeed, we credit USKG for gathering many of the task evaluations that we use. To be clear, the 13B and 34B model are trained on LLama-Factory. The 7B model was not, but theoretically there is no change that would make it perform differently.

you can send me an email at a5zhuang@uwaterloo.ca to discuss directly if you have more questions. I want to help you reproduce this work.

Answer 4 · 2024-03-20T15:00:24.000Z

Thank you for your careful and patient answer. I will try to reproduce it using Llama-Factory and give feedback here. Finally, thank you for your help again.