GraphPKU/PiSSA

High Training Loss

Closed this issue · 2 comments

Hi,

Very cool paper!

When I try to finetune a Mistral model with LoRA in 4 bits, the loss starts around 1. However, for the exact same script with the only addition being "init_lora_weights='pissa'" the loss starts around 5. I've also tried 'pissa_niter_4' but the loss is also around 5. Do you know what could be the cause?

Thank you,
Andy

Thank you for your interest in PiSSA.
In the latest version of our paper (https://arxiv.org/pdf/2404.02948.pdf), we conducted 4-bit fine-tuning experiments in Section 4.2 and supported 4-bit training in this code: https://github.com/fxmeng/peft/blob/7b8af8e53875164e60a7707fe10f07f21c1baf75/examples/pissa_finetuning/pissa_finetuning.py
Please refer to this code and the corresponding document. First, decompose the original model at full precision, and then apply 4-bit quantization to the residual model.

Thank you! I will try it out and let you know there are issues.