microsoft/LoRA

LoRA/loralib/layers.py line 151-155 nn.Linear.forward()

jiawei-liu1103 opened this issue · 2 comments

Hi,

For LoRA/loralib/layers.py line 151-155, why the feedforward implementation of the Linear layer is to first go through the original network and then add LoRA to result. This is different from the implementation of the Conv layer, where the weight is added with LoRA before feedforward.
Did I make a mistake? Or is there no difference between the two implementations for the Linear layer?

Thank you!

Hi Jiawei,

They are the same mathematically. The benefit of what I wrote for the Linear layer is we don't need to instantiate another full-rank matrix. It's trickier to do that for Conv, and since the Conv layer is a proof-of-concept, I didn't optimize it further. Let me know if that makes sense!

Thank you!