r-three/t-few

How is l_ff created?

Closed this issue · 1 comments

Firstly, thank you for the amazing work! I had a question around the implementation of $l_{ff}$ in the (IA)3 method:

The config file for (IA)3 lists lora_layers as "k|v|wi_1.*"

"lora_layers": "k|v|wi_1.*",

However, when using this string to find model layers to modify (code snippet below), it seems that while the Keys and Values in the self-attention modules are modified, all the FF layers (i.e. in the format encoder.block.x.layer.x.DenseReluDense.wi) are skipped, and thus the vector $l_{ff}$ is not created in the model ($l_k$ and $l_v$ are created as expected).

t-few/src/models/lora.py

Lines 64 to 72 in 4e581fa

if re.fullmatch(config.lora_layers, c_name):
assert isinstance(
layer, nn.Linear
), f"LoRA can only be applied to torch.nn.Linear, but {layer} is {type(layer)}."
setattr(
module,
c_name,
LoRALinear(layer, config.lora_rank, config.lora_scaling_rank, config.lora_init_scale),
)

I was thus wondering if the param lora_layers should instead be "k|v|wi.*"? Or am I missing something, and the existing config file somehow also triggers the creation of $l_{ff}$, in addition to $l_k$ and $l_v$?

Thank you!

Update: I was debugging with T5-small, and didn't realize that the FFN module in T0 model has a wi_1 layer in it.