keras-team/keras-hub

Add Support for Dynamically Specifying Layers in enable_lora Method

Opened this issue · 2 comments

Is your feature request related to a problem? Please describe.
The current implementation of the enable_lora method in Keras only allows LoRA to be enabled for a predefined set of layers ("query_dense", "value_dense", "query", "value"). This is limiting because it doesn't provide flexibility for users who want to apply LoRA to other layers.

  • In many custom architectures, users may want to enable LoRA for different layers that do not follow the predefined naming convention, which leads to unnecessary modifications to the codebase.
  • Additionally, enabling LoRA on multiple layers can significantly increase the number of trainable parameters in a model, which could lead to memory issues, especially on devices with limited resources (such as GPUs with lower VRAM). Users may want finer control over which layers have LoRA enabled to avoid memory bottlenecks during training.

Describe the solution you'd like
I propose modifying the enable_lora method to accept a list of custom layer names as a parameter. This way, users can dynamically specify which layers they want to apply LoRA to. The solution could involve adding an optional target_names argument to the method that defaults to the current predefined layers but can be overridden by users. For example:

def enable_lora(self, rank, target_names=None):
    target_names = target_names if target_name else ["query_dense", "value_dense", "query", "value"]

This enhancement would allow more flexibility in applying LoRA to various architectures without needing to modify the core method.

Describe alternatives you've considered
Some of the alternatives thought about:-
1. Inheritance and Override: Another alternative is to subclass the relevant Keras model classes and override the enable_lora method with a custom implementation. While this avoids direct modification of the library, it can still introduce complexity and redundancy for a common use case.
2. Defining Custom Lora layer and adding it manually to the model layers: Another alternative is to manually define a custom LoRA layer and adapt it to different parts of the model. I have personally experimented with this approach, and while it works, it's a tedious process. You need to manually implement the LoRA logic, inject it into the model.

Additional context
This feature would provide better control for users who are training models on resource-constrained environments and make it easier to integrate LoRA into custom models, especially when experimenting.

Note: I would like to take up this issue and implement the proposed changes. If this feature request is accepted, I will raise a pull request (PR) with the necessary modifications.

Have you considered calling enable_lora on a per-layer basis? Is there a reason that approach wouldn't work for your case?

Yes, I have considered calling enable_lora on a per-layer basis. While this approach is certainly possible, I believe this may not be the most efficient approach for certain use-cases.

For instance, large models with many layers, manually identifying and calling enable_lora for each individual layer can be time-consuming and error-prone. Having the ability to dynamically pass a list of layer names to the enable_lora method would make the code cleaner and easier to maintain.

Additionally, I'd like to highlight that Hugging Face also offers LoRA with dynamic target layers. For example, they allow users to specify target modules dynamically through their LoraConfig:

target_modules = ["q_proj", "k_proj", "v_proj", "out_proj", "fc_in", "fc_out", "wte"]
config = LoraConfig(
    r=4, lora_alpha=16, target_modules=target_modules, lora_dropout=0.1, bias="none", task_type="CAUSAL_LM"
)

This dynamic specification provides flexibility and allows users to enable LoRA on multiple layers with minimal effort. Implementing something similar in Keras would ensure consistent usage across libraries, making it easier for users who switch between frameworks or work with different libraries in the same pipeline.

While calling enable_lora per layer works for small-scale use cases, the enhancement I am proposing is aimed at simplifying this process for larger and more complex models, while aligning Keras more closely with other widely-used libraries like Hugging Face.

Please let me know if this approach makes sense, or if there are any additional considerations I should take into account.