Some question on code of LISA
Opened this issue · 0 comments
milong26 commented
I have known that LISA's core code in src\lmflow\pipeline\finetuner.py, mainly in class DynamicLayerActivationCallback. I read it with Algorithm 1 Layerwise Importance Sampling AdamW (LISA) in paper aside.
So where is step2: Freeze all layers except the embedding and language modeling head layer? I can only find def freeze_all_layers(self) in class DynamicLayerActivationCallback, not excluding embedding and head layer
And i'm curious on the notation k in paper Algorithm 1:
step 4: Run AdamW for K iterations with
My english is bad so tell me if any understanding problem,thanks for answering