huggingface/peft

Clarification needed on Adapter Heads in PEFT

lenglaender opened this issue · 2 comments

I have questions regarding the behaviour of adapter heads when activating and merging LoRA adapters. I couldn't find any information in the documentation on how PEFT handles heads.

I have trained and loaded multiple LoRA adapters for classification and language modelling tasks. Each classification adapter has learned its own classification head. I load the adapters like this:

model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME)
model = PeftModel.from_pretrained(model, "my_first_path", adapter_name="classification_adapter_1") # this LoRA adapter has a head
model.load_adapter("my_second_path", adapter_name="classification_adapter_2") # this LoRA adapter has a head
model.load_adapter("my_third_path", adapter_name="classification_adapter_3") # this LoRA adapter has a head
model.load_adapter("my_fourth_path", adapter_name="language_adapter_1") # LoRA no head
model.load_adapter("my_fifth_path", adapter_name="language_adapter_1") # LoRA no head

Now, I have two specific questions:

  1. When activating an adapter, e.g. using model.set_adapter("classification_adapter_3"), is the corresponding adapter head also set active? I couldn't find any code that explicitly sets the adapter head as active. These are the methods I looked into:
  2. When merging multiple adapters using add_weighted_adapter, are the corresponding adapter heads also merged? I couldn't find any code that handles the merging of adapter heads in the add_weighted_adapter function:

Expected Behavior

  1. When activating an adapter, I would expect the corresponding adapter head to be set active as well so that the model can correctly handle the specific task the adapter was trained for.

  2. When merging multiple adapters using add_weighted_adapter, I would expect the corresponding adapter heads to be merged as well so that the merged adapter can handle the combined tasks of the individual adapters.

Any clarification or guidance on these questions would be greatly appreciated. Thank you!

First a clarifying question: When you speak of adapter heads, do you mean that you added them when training the model by setting LoraConfig(modules_to_save=...)? Note that if you did not set it explicitly but used a preconfigured model, it might have been set automatically (check the adapter_config.json).

2. When merging multiple adapters using add_weighted_adapter, I would expect the corresponding adapter heads to be merged as well so that the merged adapter can handle the combined tasks of the individual adapters.

This is definitely not done. In fact, there is no trivial way to merge adapter heads in a way that would preserve their functionality.

Yes, I meant the (linear) layer that is added on top of the base model for the specific task. I didn't know that I have to save this layer by setting it in modules_to_save, thanks.

This is definitely not done. In fact, there is no trivial way to merge adapter heads in a way that would preserve their functionality.

Okay, thanks!