OptimalScale/LMFlow

Some question on code of LISA

Opened this issue · 0 comments

I have known that LISA's core code in src\lmflow\pipeline\finetuner.py, mainly in class DynamicLayerActivationCallback. I read it with Algorithm 1 Layerwise Importance Sampling AdamW (LISA) in paper aside.

So where is step2: Freeze all layers except the embedding and language modeling head layer? I can only find def freeze_all_layers(self) in class DynamicLayerActivationCallback, not excluding embedding and head layer

And i'm curious on the notation k in paper Algorithm 1:
step 4: Run AdamW for K iterations with ${η_t}_{t=ik}^{ik=k-1}$ Is the k same as K ?

My english is bad so tell me if any understanding problem,thanks for answering