Details of LoRA of pruned models.
Opened this issue · 6 comments
Great work and thanks for the codebase!
I want to know the exact detailed of LoRA fine-tuning as mentioned in Table 6 of the main paper.
Also if you could point-out to the bash script to reproduce the same, would be great!
Thank you for your interests. Below is the scripts for lora ft:
# CUDA_VISIBLE_DEVICES=3 python finetune_lm.py \
# --model_name_or_path /path/to/workspace/wanda/saved_model/llama1_7b_2-4 \
# --config_name "/path/to/llama-7b-hf" \
# --dataset_name c4 \
# --num_train_epochs 1 \
# --block_size 1024 \
# --per_device_train_batch_size 1 \
# --per_device_eval_batch_size 8 \
# --do_train \
# --do_eval \
# --max_train_samples 30000 \
# --max_eval_samples 128 \
# --learning_rate 1e-4 \
# --overwrite_output_dir \
# --output_dir ./saved_model/llama1_7b_2-4_lora
You should first save the pruned model to /path/to/workspace/wanda/saved_model/llama1_7b_2-4
and then finetune using lora
Thanks alot, any idea on the GPU hours you need for this run?
@Arnav0400 Hi,sorry for the late reply. It roughly costs one GPU day for LLaMA1-7B.
Thanks for your reply! Did you perform a zero-shot evaluation on the LoRA fine-tuned+pruned models? This is practically very important as models undergo fine-tuning before deployment.
@Arnav0400 Not yet. But it is easy to perform this evaluation. What you should do is to save the checkpoint of LoRA fine-tuned pruned LLMs.
Using LoRA to fine-tune the pruned model is very common, especially for structured pruning methods. You can refer to my repo with structured
tag for more references at https://github.com/pprp/Awesome-LLM-Prune.
Also, the fine-tuning code in our repo is not strong enough, as it just provide limited PEFT methods, I recommend you use LLM-Adapters repo, which provide more datasets and more PEFT adapters.
Please let me know if you have the checkpoint someplace, would save some time for me. I am interested in seeing the role of peft in sparse models.