support for heterogeneous types for `modules_to_save`

Feature request

From my understanding of the current implementation, the modules_to_save wrappers are currently limited to copying only one specific layer of the model (reference:

peft/src/peft/utils/other.py

Line 192 in 859fd88

class ModulesToSaveWrapper(torch.nn.Module):

). Adding this feature would allow for different sets of modules to be saved for each LoRA. For instance, this could support multiple LoRA classifiers, each with classifier layers of varying sizes applied to the last layer.

Motivation

This feature is particularly useful for the final classifier layer. Currently, I have a model with multiple LoRAs attached for a classification task, but the classifier layers are not all the same size. As a result, I need to maintain several models, grouping LoRAs with the same classifier size into the same base model. However, since the core model remains identical, it should be possible to use a single base model for all of them, especially since we are training the classifier layers from scratch. A potential solution could be to introduce an additional option, allowing users to specify the modules_to_save class for the classifier layer, instead of simply copying the existing layer.

Your contribution

I'm happy to explore possible solutions and potentially contribute a PR if this is considered a valuable addition to the library.

Hmm, I see the problem but I'm not sure if modules_to_save is the right place to add a solution. The issue is that modules_to_save will create a copy of the targeted layer. Therefore, this copy will start from the same parameters as the original layer. If we allow to change the shape of the parameters, they can no longer be a copy of the base parameters but would need to be randomly initialized.

I understand that for some tasks, we just randomly initialize the classifier head, so it doesn't really matter if modules_to_save creates a copy with the exact same parameters or if the parameters are reinitialized randomly. But for other use cases, modules_to_save can target already trained layers, so there it does matter.

For this reason, I feel like modules_to_save is not the right place to put such functionality. If I had this problem, I would probably first create the base model and add multiple classifier heads that can be switched via some parameter (basically what you would have to do if you did full fine-tuning). Then you could create one PEFT adapter per task/classifier head that each has its own modules_to_save which targets one of these classifier heads.

It is a bit wasteful since we create a copy for each of these heads, so if memory is very tight, this would not be an optimal solution. Given that you probably don't need the weights of the randomly initialized heads, you could probably delete those to recuperate the memory. It would just mean that you can't use the model without the adapter (e.g. disabling adapters would not work) but there would not really be any reason to do that.