Targeting all layers and biases
grimulkan opened this issue · 2 comments
What would I have to do to target all relevant layers including biases with this repo for LORA training? For instance, would the following change alone work?
lora_config = LoraConfig(
r=ft_config.lora_r,
lora_alpha=ft_config.lora_alpha,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
lora_dropout=ft_config.lora_dropout,
bias="all", #or "lora_only"
task_type="CAUSAL_LM",
)
I am not sure if this actually includes the biases or if they just get zeroed out elsewhere. If there are other places to modify, I would appreciate any suggestions.
Also, I have not tried it before, but is there any value in including the biases for LORA training? Similarly, is there any value in training the other layers (other than the 4 listed above)?
I am aware of the result where targeting the above 4 was more effective than this repo's default (only q, v) allowing for lower LORA rank, but not sure if there is any further benefit from the other layers or biases. There is some relevant discussion in #129 as well (with comments from @kaiokendev).
Would appreciate any thoughts, folklore or otherwise.
LLaMA does not use bias (you can verify in modeling_llama.py, the bias is set to False and bias tensors are all 0), as for the modules you need to also add gate_proj, up_proj, down_proj
-- MLP modules. lm_head
is not necessary
You’re right about the biases! Much appreciated.