Train full-parameter 7B language models with only 22 GB of GPU RAM!
LOMO and AdaLOMO are low-memory optimization methods that use in-place gradient updates to significantly decrease the amount of GPU memory necessary.
This repository, based off of the LOMO and collie repositories from OpenLMLab, allows for easy use of the LOMO and AdaLOMO optimizers.
While the LOMO and collie repositories are bulky and hard to use (you have to integrate your code into their code), this repository allows you to simply take the LOMO and AdaLOMO optimizers and use them with few modifications.
- Add the following imports to your code:
import sys; sys.path.append("/path/to/LOMOLite/")
from lomo.lomo_base import setup_lomo, create_lomo_lr_scheduler, Functor, LOMOBaseLite
- (optional) Call
setup_lomo
fortransformers
models in order to receive a config with additional LOMO settings.
config = setup_lomo(pretrained_model_name_or_path)
- (optional) Run
create_lomo_lr_scheduler
to get a learning rate scheduler. Alternatively, you can pass in a single number as the learning rate.
lr_scheduler = create_lomo_lr_scheduler(
learning_rate=lr,
n_steps=1000,
num_train_epochs=10,
warmup=0.1,
lr_scheduler_type="linear",
)
- Create your optimizer.
optimizer_name
is one of:lomo
oradalomo
, depending on which one you would like to use.
optimizer = LOMOBaseLite(
optimizer_name, model, clip_grad_norm=1.0, clip_grad_value=None, lr_scheduler=lr_scheduler
)
- Subclass the
Functor
class and override theforward
function to create a class that has a collection of attributes, and that, when run, runs a forward pass through the model and returns the loss. Then, instantiate your custom class by passing in keyword arguments that will be then be passed into yourforward
function.
class MyFunctor(Functor):
def forward(self, loss_fn, model, batch, train_config):
loss = loss_fn(model, batch, train_config)
return loss
- During training, instead of directly calculating your loss, instead instantiate your custom class and pass in the functor to the
optimizer.step
method. This will return the loss value. Make sure to pass in the optimizer as the model.
functor = MyFunctor(loss_fn=loss_fn, model=optimizer, batch=batch, train_config=train_config)
loss = optimizer.step(functor)
- Instead of calling
torch.save(model.state_dict())
, calloptimizer.save_pretrained
with asave_folder
path. The state dict will then be saved to apytorch_model.bin
file in the specifiedsave_folder
.
optimizer.save_pretrained(save_path)
- When running your python file, you can override the default environment variables:
LOCAL_RANK
,RANK
,WORLD_SIZE
,MASTER_ADDR
, andMASTER_PORT
. In most cases, you will not need to override any of these, though sometimesMASTER_PORT
will be overridden if two processes are running on the same machine.
MASTER_PORT=6001 python3 main.py ...
Currently LOMOLite only supports training on a single GPU, and by default is set to use bfloat16
precision with no loss scaling.
- Support multi-GPU training
- Integrate into PyTorch
https://github.com/OpenLMLab/collie/tree/dev/collie
https://github.com/OpenLMLab/LOMO
Full Parameter Fine-Tuning for Large Language Models with Limited Resources
AdaLomo: Low-memory Optimization with Adaptive Learning Rate