OptimalScale/LMFlow

Question Regarding Optimizer Reinitialization in Lisa Implementation

eric8607242 opened this issue · 4 comments

Hi @research4pan ,
Thanks for your great work.

I have a question regarding the implementation where the optimizer state of the freeze layer is discarded in LMFlow.
I've been trying to locate this particular section in the code, but I couldn't find any corresponding implementation in https://github.com/OptimalScale/LMFlow/blob/main/src/lmflow/pipeline/finetuner.py#L301.

Your help in figuring this out would be greatly appreciated.
Thanks for your time!

Thanks for your interest in LMFlow! The current implementation in LMFlow doesn't have this logic. We are working on integrating the model-parallelism support for LISA, which will incorporate the corresponding logics of optimizer state dropping. Please stay tuned for our latest updates. Thanks for your understanding 😄

Hi @research4pan,
I appreciate your quick response.

Is my current understanding below correct?
The current implementation of LISA in LMFlow has not yet achieved ideal efficiency reported in the paper, due to the optimizer needing to store the state of all model parameters.

Thanks again!

Thanks for your comment! In terms of performance, we have tested the instruction following task in LLaMA-7b, the implementation in LMFlow achieved similar performance as the one reported in the paper. In terms of saved memories, the current implementation in LMFlow still has room for improvement, as those saved optimizer states occupy extra memory when compared with the one reported in the paper.

Hope this information can be helpful 😄

Got it! Thanks for your kindful and helpful response!