Clarification needed on Adapter Heads in PEFT

Question

Clarification needed on Adapter Heads in PEFT

lenglaender opened this issue 2 months ago · 2 comments

I have questions regarding the behaviour of adapter heads when activating and merging LoRA adapters. I couldn't find any information in the documentation on how PEFT handles heads.

I have trained and loaded multiple LoRA adapters for classification and language modelling tasks. Each classification adapter has learned its own classification head. I load the adapters like this:

model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME)
model = PeftModel.from_pretrained(model, "my_first_path", adapter_name="classification_adapter_1") # this LoRA adapter has a head
model.load_adapter("my_second_path", adapter_name="classification_adapter_2") # this LoRA adapter has a head
model.load_adapter("my_third_path", adapter_name="classification_adapter_3") # this LoRA adapter has a head
model.load_adapter("my_fourth_path", adapter_name="language_adapter_1") # LoRA no head
model.load_adapter("my_fifth_path", adapter_name="language_adapter_1") # LoRA no head

Now, I have two specific questions:

When activating an adapter, e.g. using model.set_adapter("classification_adapter_3"), is the corresponding adapter head also set active? I couldn't find any code that explicitly sets the adapter head as active. These are the methods I looked into:
- https://github.com/huggingface/peft/blob/main/src/peft/tuners/tuners_utils.py#L525
- https://github.com/huggingface/peft/blob/main/src/peft/tuners/lora/model.py#L351
When merging multiple adapters using add_weighted_adapter, are the corresponding adapter heads also merged? I couldn't find any code that handles the merging of adapter heads in the add_weighted_adapter function:
- https://github.com/huggingface/peft/blob/main/src/peft/tuners/lora/model.py#L520

Expected Behavior

When activating an adapter, I would expect the corresponding adapter head to be set active as well so that the model can correctly handle the specific task the adapter was trained for.
When merging multiple adapters using add_weighted_adapter, I would expect the corresponding adapter heads to be merged as well so that the merged adapter can handle the combined tasks of the individual adapters.

Any clarification or guidance on these questions would be greatly appreciated. Thank you!

Answer 1 · 2024-04-25T08:52:15.000Z

First a clarifying question: When you speak of adapter heads, do you mean that you added them when training the model by setting LoraConfig(modules_to_save=...)? Note that if you did not set it explicitly but used a preconfigured model, it might have been set automatically (check the adapter_config.json).

2. When merging multiple adapters using add_weighted_adapter, I would expect the corresponding adapter heads to be merged as well so that the merged adapter can handle the combined tasks of the individual adapters.

This is definitely not done. In fact, there is no trivial way to merge adapter heads in a way that would preserve their functionality.

Answer 2 · 2024-04-30T09:34:54.000Z

Yes, I meant the (linear) layer that is added on top of the base model for the specific task. I didn't know that I have to save this layer by setting it in modules_to_save, thanks.

This is definitely not done. In fact, there is no trivial way to merge adapter heads in a way that would preserve their functionality.

Okay, thanks!