Add support for memory-efficient and faster optimizers
rasbt opened this issue · 1 comments
rasbt commented
Maybe GaLore (#1192) should be changed from GaloreArgs
to OptimizerArgs
after all. Then we can also more easily consider other variants such as BAdam (BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models, https://arxiv.org/abs/2404.02827).
The experiments from here look very compelling. And it only adds 1 hyperparameter:
lantiga commented
Agreed