Details of LoRA of pruned models.

Question

Details of LoRA of pruned models.

Opened this issue 4 months ago · 6 comments

Great work and thanks for the codebase!

I want to know the exact detailed of LoRA fine-tuning as mentioned in Table 6 of the main paper.
Also if you could point-out to the bash script to reproduce the same, would be great!

Answer 1 · 2024-06-11T14:58:59.000Z

Thank you for your interests. Below is the scripts for lora ft:

# CUDA_VISIBLE_DEVICES=3 python finetune_lm.py \
#     --model_name_or_path /path/to/workspace/wanda/saved_model/llama1_7b_2-4 \
#     --config_name "/path/to/llama-7b-hf" \
#     --dataset_name c4 \
#     --num_train_epochs 1 \
#     --block_size 1024 \
#     --per_device_train_batch_size 1 \
#     --per_device_eval_batch_size 8 \
#     --do_train \
#     --do_eval \
#     --max_train_samples 30000 \
#     --max_eval_samples 128 \
#     --learning_rate 1e-4 \
#     --overwrite_output_dir \
#     --output_dir ./saved_model/llama1_7b_2-4_lora

You should first save the pruned model to /path/to/workspace/wanda/saved_model/llama1_7b_2-4 and then finetune using lora

Answer 2 · 2024-06-12T10:40:25.000Z

Thanks alot, any idea on the GPU hours you need for this run?

Answer 3 · 2024-06-14T01:25:30.000Z

@Arnav0400 Hi，sorry for the late reply. It roughly costs one GPU day for LLaMA1-7B.

Answer 4 · 2024-06-18T10:48:52.000Z

Thanks for your reply! Did you perform a zero-shot evaluation on the LoRA fine-tuned+pruned models? This is practically very important as models undergo fine-tuning before deployment.

Answer 5 · 2024-06-19T06:02:05.000Z

@Arnav0400 Not yet. But it is easy to perform this evaluation. What you should do is to save the checkpoint of LoRA fine-tuned pruned LLMs.

Using LoRA to fine-tune the pruned model is very common, especially for structured pruning methods. You can refer to my repo with structured tag for more references at https://github.com/pprp/Awesome-LLM-Prune.

Also, the fine-tuning code in our repo is not strong enough, as it just provide limited PEFT methods, I recommend you use LLM-Adapters repo, which provide more datasets and more PEFT adapters.

Answer 6 · 2024-06-19T06:06:03.000Z

Please let me know if you have the checkpoint someplace, would save some time for me. I am interested in seeing the role of peft in sparse models.