Questions about ranknum
luchaoqi opened this issue · 1 comments
Hi, thanks for this awesome work!
I wanted to ask about the purpose of self.ranknum
in adalora.py
. It seems that in transformer.py
you implemented ranknum with self.adapt_scaling
here.
However in adalora.py
, the ranknum seems just a constant with requires_grad=False
for scaling purpose:
self.scaling / (self.ranknum+1e-5)
this is different in transformer
.
btw, why not directly use self.r
but self.ranknum+1e-5
in this case:
self.scaling / self.r
Hello, thanks for your interest in our paper. We were experimenting with whether AdaLoRA requires dynamically adjusting the rank scale explicitly after some singular values are masked out, as it may result in a potential decrease of matrix magnitude. Therefore, adjusting the rank scaling (ranknum) may be needed after rank allocation. However, it turned out that explicitly adjusting ranknum
hurts the performance. The reason can be the discarded singular values may have small magnitude, resulting in a minimum influence on the matrix magnitude. There is no need to adjust rank scaling explicitly but we still leave the function here for future development :) Hope this can answer your questions. Thanks for your comments.