Flora without FloraAccelerator
Closed this issue · 5 comments
If we use Flora optimizer without FloraAccelerator, is this appropriate to be able to separate? And with FloraAccelerator we get the AC but LOMO is with the Flora optimizer?
Was interested in dropping in the optimizer but having the accelerator be different would be a little more work for my workflow.
Thanks for this.
The accelerator provides both AC and LOMO. However, since LOMO is "fused" with the optimizer, we have to predefine the optimizer update step:
flora-opt/flora_opt/optimizers/torch/flora.py
Lines 414 to 416 in c50ee52
Here, we choose the Flora step, which is designed to work with the Flora optimizer only. If you are interested in using other optimizers, you need to change the update function used in the accelerator.
Ok to clarify, AC and LOMO require the FloraAccelerator.
We can use the Flora optimizer without the Accelerator but lose AC/LOMO optimizations. (like in Figure 2a of the paper with the Flora option)
If we want to use another optimizer with FloraAccelerator we would need to create our own prepare_optimizer to handle the fusing and account for the AC/LOMO and other principles?
Thank you for your help.
That is correct. You can pass new step functions to the with_flora_accelerator
wrapper (like the example above) to have LOMO and compressed gradient accumulation enabled with your optimizer.
Stale due to inactivity. Closing in 3 days if no further activities.
Close due to inactivity