High Training Loss

Question

High Training Loss

Closed this issue 5 months ago · 2 comments

Hi,

Very cool paper!

When I try to finetune a Mistral model with LoRA in 4 bits, the loss starts around 1. However, for the exact same script with the only addition being "init_lora_weights='pissa'" the loss starts around 5. I've also tried 'pissa_niter_4' but the loss is also around 5. Do you know what could be the cause?

Thank you,
Andy

Answer 1 · 2024-04-16T05:41:51.000Z

Thank you for your interest in PiSSA.
In the latest version of our paper (https://arxiv.org/pdf/2404.02948.pdf), we conducted 4-bit fine-tuning experiments in Section 4.2 and supported 4-bit training in this code: https://github.com/fxmeng/peft/blob/7b8af8e53875164e60a7707fe10f07f21c1baf75/examples/pissa_finetuning/pissa_finetuning.py
Please refer to this code and the corresponding document. First, decompose the original model at full precision, and then apply 4-bit quantization to the residual model.

Answer 2 · 2024-04-17T01:39:57.000Z

Thank you! I will try it out and let you know there are issues.