QingruZhang/AdaLoRA

Questions about ranknum

luchaoqi opened this issue · 1 comments

Hi, thanks for this awesome work!

I wanted to ask about the purpose of self.ranknum in adalora.py. It seems that in transformer.py you implemented ranknum with self.adapt_scaling here.
However in adalora.py, the ranknum seems just a constant with requires_grad=False for scaling purpose:

self.scaling / (self.ranknum+1e-5)

this is different in transformer.
btw, why not directly use self.r but self.ranknum+1e-5 in this case:

self.scaling / self.r

Hello, thanks for your interest in our paper. We were experimenting with whether AdaLoRA requires dynamically adjusting the rank scale explicitly after some singular values are masked out, as it may result in a potential decrease of matrix magnitude. Therefore, adjusting the rank scaling (ranknum) may be needed after rank allocation. However, it turned out that explicitly adjusting ranknum hurts the performance. The reason can be the discarded singular values may have small magnitude, resulting in a minimum influence on the matrix magnitude. There is no need to adjust rank scaling explicitly but we still leave the function here for future development :) Hope this can answer your questions. Thanks for your comments.