Learning the combination weights of pre-trained LoRA Modules

Question

Learning the combination weights of pre-trained LoRA Modules

mahdibeit opened this issue 2 months ago · 5 comments

Feature request

PEFT can combine pre-trained LoRA modules by averaging them or providing custom weights for weighted averaging. This paper showed that learning these weights is better than naive averaging in few-shot adaption settings.

Motivation

Learning the combination weights allows users to utilize already available pre-trained LoRA modules in the Hugging Face Models. Also, it is very parameter efficient since we are only learning the combination weights. More importantly, it can surpass learning a LoRA from scratch in settings where the number of training samples is limited.

Your contribution

I can submit a PR. Then, PEFT can combine any pre-trained LoRA using the following format:

wlora_config = WLoraConfig(skilled_loras = [PATH_TO_UPSTREAM_1, PATH_TO_UPSTREAM_2, ])

model = get_peft_model(llama2, wlora_config )

Answer 1 · 2024-04-16T12:34:09.000Z

Hi, thanks you for proposing to add this method.

I only skimmed the paper but IIUC, we assume that the user has a couple of already trained LoRA adapters and now wants to combine them for a new task. The idea is that by learning the weights used for the weighted average (the weights argument for add_weighted_adapter) can lead to better results than naive uniform weights. (Note that we offer many combination types, not just averaging, maybe that's worth looking into for the paper.)

To learn these weights, I assume we have to load all the LoRA adapters at training time, freeze their weights, then add an extra scaling factor to this line, is that right?

I haven't thought through the overall design of this, but I think it should be possible to add this to the existing LoRA code without too many additions. Feel free to open a draft PR where we can discuss the design.

Answer 2 · 2024-04-16T21:02:15.000Z

Hi @BenjaminBossan , thanks for taking the time to read the paper.

Thank you for your great suggestion. We will evaluate other combination methods in the paper.

To answer your question, yes, you are absolutely right. We can just use the existing LoRA code with a Boolean like learn_combination_weights to configure the training process. Overall, I need to do the following:

Instantiate a trainable tensor named combination_weights that learns the scaling factor for each pre-trained, frozen LoRA.
During the forward pass, call softmax over combination_weights and then multiply each LoRA by the corresponding index of the combination_weights in this line .

During these steps, I have to make sure that LoRA weights are frozen and merge method works as intended.

Also, it is possible to create a new peft/tuner module named something like WLoRA and write the appropriate class in there. This allows users to just run the following

wlora_config = WLoraConfig(upstream_loras = [PATH_TO_UPSTREAM_1, PATH_TO_UPSTREAM_2, ])

model = get_peft_model(llama2, wlora_config )

I personally prefer the second option as it does not complicate the main LoRA module and allows easier use. However, I trust your judgment. Let me know which direction you prefer and I can start implementing and opening a draft PR.

Answer 3 · 2024-04-17T09:03:40.000Z

I think the 2nd suggestion with a dedicated class is good, it can still re-use much of the existing code, though. Regardless, if you have some code to share, feel free to do so, as that makes the discussion much easier.

Answer 4 · 2024-04-22T01:25:29.000Z

Hi @BenjaminBossan, I just opened a draft PR using the first method. Let me know what you think. The main concern that I have is this line to freeze pre-trained lora_A and lora_B.

Answer 5 · 2024-05-16T15:03:36.000Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.